We've been putting the contains(fuzzy) clause to a heavy workout lately, and we're finding some very odd values in the score() calculation.
I don't have a reproducible example that I can share right now, unfortunately, but I wanted to post a description of what we are seeing, in case anyone else had seen something similar and had any ideas for a workaround.
We're on Hana Revision 97.
We've got two tables and we're fuzzy matching on 4 string (not text) fields. At this point in our testing, we're putting the smaller table into a cursor, looping through it record by record and then running a query with the contains clause on 4 different fields. Each field is using the same parameters on the contains clause like this: contains(field_value, cursor.value, fuzzy(0.8, 'emptyScore=0.7'), and then we're also pulling the Score().
Here are some of the bits of math we've seen so far.
If any fields have both sides as nulls (not blanks), we get a .5 for score, no matter how many other fields match and the scores of the other matches.
The only exception to this is that it doesn't cause "bad" matches, if any of the fields are below the .8 threshold it doesn't come through as a match.
But if we have 2 or 3 exact matches, and 1 or 2 null vs null comparisons, we get .5 every time. If there are close matches where one of the comparisons would evaluate to a .9 instead of a 1, the number actually goes UP from .5, so 2 exact matches, 1 close match, and 1 null vs null might give us 5.12.
If we have any fields where one side is null (or blank) and the other side isn't, the Score average is wrong. For example, two fields are exact matches and two fields have one side null and one side not null. That should be 1 + 1 + .7 + .7 = 8.5, we get 7.8.
If we change the cursor.field values from nulls to blanks, we get similar behavior.
If we change the field_value side to blanks, or both sides to blanks, suddenly the math works. For two exact matches and two blank vs blank, we get 8.5.
However, even with blanks, if we have two exact matches, and two values that are blank one side of the comparison, we still get 7.8 instead of 8.5.
Has anyone ever seen anything like this? Have any suggestions for workaround? We've changed the nulls on the field_value side, and that got rid of the .5 bug, but we haven't found any kind of workaround for the other side of the issue.
One more wrap up example, if we have these 4 fields:
MONKEY, PAW, 1234, 987654 vs MONKEY,PAW,,
we get 7.8
BUT
MONKEY, PAW,, vs MONKEY,PAW,,
that gets 8.5
And I just came back to add one more wrinkle. If we change it so the field_value is blank instead of null, then when it does the comparison with the cursor.value it comes back a perfect match if that field is either a null or a blank, which I guess is what the documentation says, sort of. Except that it seems to be completely inconsistent about when it's true. Blank vs blank gets you 1.0, null vs blank one direction gets you 1.0, blank vs null the other direction gets you .5 for your total score no matter the other matches, and null vs null gets an individual score of .5 instead of the emptyscore which drops the record out of the matches because of the .8 fuzzy value.