Reputation: 312
I m using elastic search on a huge dataset of all wikipedia article names they are approx 5 million in numbers database field name is articlenames
curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
"settings":{
"analysis":{
"filter":{
"nGram_filter":{
"type":"edgeNGram",
"min_gram":1,
"max_gram":20,
"token_chars":[
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"tokenizer":{
"edge_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"20",
"token_chars":[
"letter",
"digit"
]
}
},
"analyzer":{
"nGram_analyzer":{
"type":"custom",
"tokenizer":"edge_ngram_tokenizer",
"filter":[
"lowercase",
"asciifolding"
]
}
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"mappings":{
"name":{
"properties":{
"articlenames":{
"type":"text",
"analyzer":"nGram_analyzer"
}
}
}
}
}'
Referencing these links to solve my problem as well but in vain
Edge NGram with phrase matching
https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf
my aim is to get results like below for input query of "sachin t"
sachin tendulkar
sachin tendulkar centuries
sachin tejas
sachin top 60 quotes
sachin talwalkar
sachin tawade
sachin taps
and for query of "sachin te"
sachin tendulkar
sachin tendulkar centuries
sachin tejas
and for query of "sachin ta"
sachin talwalkar
sachin tawade
sachin taps
and for query of "sachin ten"
sachin tendulkar
sachin tendulkar centuries
Remember the dataset is huge some article names and words can have special characters and words like "Bronisław-Komorowski"
I am able to get output for smaller dataset up to 100 thousand records but as soon as my dataset changes to 0.5 to 5 million records I am unable to get output
and my query is
http://127.0.0.1:9200/index_wiki_articlenames/_search?&q=articlenames:sachin-t+articlenames:sachin-t.*&filter_path=hits.hits._source.articlenames&size=50
Upvotes: 2
Views: 1455
Reputation: 724
I know it's too late, but anybody who's looking for a solution for this can try this query. Mapping & Index is correct. Seems to be missing and operator in query section.
GET index_wiki_articlenames/_search
{
"query": {
"match": {
"articlenames": {
"query": "sachin ten",
"operator": "and"
}
}
}
}
This results in
sachin tendulkar
sachin tendulkar centuries
Upvotes: 0
Reputation: 441
You should try these settings:
curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
"settings":{
"analysis":{
"tokenizer":{
"edge_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"20",
"token_chars":[
"letter",
"digit"
]
}
},
"analyzer":{
"nGram_analyzer":{
"type":"custom",
"tokenizer":"edge_ngram_tokenizer",
"filter":[
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings":{
"name":{
"properties":{
"articlenames":{
"type":"text",
"analyzer":"nGram_analyzer",
"search_analyzer": "standard"
}
}
}
}
}'
Also when querying try this query:
GET my_index/_search
{
"query": {
"match": {
"articlenames": {
"query": "Sachin T",
"operator": "and"
}
}
}
}
Upvotes: 0