Petr Savchenko
Petr Savchenko

Reputation: 194

Elastic Search substring search

I need to implement search by substring. It is supposed to work the same like “CTRL + F” that highlight a word if its substring matches it.

The search is going to be performed by two fields only:

However, number of records going to be pretty large about a million.

So far I’m using querystring search by keywords wrapped with wildcards but it will definitely lead to performance problems later on once number of records will start growing.

Do you have any suggestions how would I do more performance wise solution?

Upvotes: 1

Views: 1761

Answers (1)

Bhavya
Bhavya

Reputation: 16172

Searching with leading wildcards is going to be extremely slow on a large index

Avoid beginning patterns with * or ?. This can increase the iterations needed to find matching terms and slow search performance.

As written in documentation wildcards queries are very slow. Better to use ngram strategy if you want it to be fast at query time. If you want to search by partial match, word prefix, or any substring match it is better to use n-gram tokenizer, which will improve the full-text search.

The ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length.

Please go through this SO answer, that includes a working example for a partial match using ngrams

Upvotes: 1

Related Questions