Elasticsearch minhash prefix query with wildcards?

Question

I have a minhash field generated for some text (based on minhash algorithm), now my question is, is it possible to somehow complement or add the prefix query with wildcards? Because the problem is, the hashed string values are based on the content (text) position of the shingles/tokens. So the first few characters (prefix) might not always exactly match similar content. Would it be possible to add a wildcard, e.g *3AF8659GJ in front of the prefix for a query?

EDIT: I guess I wasnt thinking hard enough about the problem. The hash differences can be anywhere in the hash-string (based on text differences in the content position of the difference of the text). So I guess the "best" only way would be edit distance and some threshhold.

E.g put all hashes into an array and sort them in lexical order (or how would you sort Hex-strings?) and then you only compare the next k documents until the edit-distance threshold is reached, and put the duplicates in a separate array..

Elasticsearch minhash prefix query with wildcards?

Answers (1)

Related Questions