Reputation: 6874
I've been searching but haven't found how to do this in refine.
I've got two columns of unique IDS. For each a in A, I want to find the top 10 closest matches in B.
My backup plan is to just use Levenshtein to iterate ... but Refine has such a nice iterface and many more algorithms implemented that I was hoping to be able to do some of the work using it.
Or is there another tool for doing this?
Upvotes: 1
Views: 372
Reputation: 1787
Did you know you can use clustering algorithm like fingerprint or ngramFingerprint (source) out of the clustering interface in Refine?
Using you IDS field, create a new column based on this column with the following expression: ngramFingerprint(value)
You can now cross with your other data set on this new column. This might help to get more matches.
Upvotes: 1