tina
tina

Reputation: 312

How to do Incremental/Search as you type full text search on 5 million records sets using Elastic search

I m using elastic search on a huge dataset of all wikipedia article names they are approx 5 million in numbers database field name is articlenames

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "filter":{
            "nGram_filter":{
               "type":"edgeNGram",
               "min_gram":1,    
               "max_gram":20,
               "token_chars":[
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         },
         "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer"
            }
         }
      }
   }
}'

Referencing these links to solve my problem as well but in vain

Edge NGram with phrase matching

https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf

my aim is to get results like below for input query of "sachin t"

sachin tendulkar
sachin tendulkar centuries
sachin tejas 
sachin top 60 quotes
sachin talwalkar
sachin tawade
sachin taps

and for query of "sachin te"

sachin tendulkar
sachin tendulkar centuries
sachin tejas 

and for query of "sachin ta"

sachin talwalkar
sachin tawade
sachin taps

and for query of "sachin ten"

sachin tendulkar
sachin tendulkar centuries

Remember the dataset is huge some article names and words can have special characters and words like "Bronisław-Komorowski"

I am able to get output for smaller dataset up to 100 thousand records but as soon as my dataset changes to 0.5 to 5 million records I am unable to get output

and my query is

http://127.0.0.1:9200/index_wiki_articlenames/_search?&q=articlenames:sachin-t+articlenames:sachin-t.*&filter_path=hits.hits._source.articlenames&size=50

Upvotes: 2

Views: 1455

Answers (2)

Vishnu
Vishnu

Reputation: 724

I know it's too late, but anybody who's looking for a solution for this can try this query. Mapping & Index is correct. Seems to be missing and operator in query section.

GET index_wiki_articlenames/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "sachin ten", 
        "operator": "and"
      }
    }
  }
}

This results in

sachin tendulkar
sachin tendulkar centuries

Upvotes: 0

Sidhant
Sidhant

Reputation: 441

You should try these settings:

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}'

Also when querying try this query:

GET my_index/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "Sachin T", 
        "operator": "and"
      }
    }
  }
}

Upvotes: 0

Related Questions