Reputation: 35
I only see n-gram and edge n-gram, both of them start from the first letter. I would like to create some tokenizer which can produce the following tokens.
For example: 600140 -> 0, 40, 140, 0140, 00140, 600140
Upvotes: 2
Views: 1141
Reputation: 217274
You can leverage the reverse
token filter twice coupled with the edge_ngram
one:
PUT reverse
{
"settings": {
"analysis": {
"analyzer": {
"reverse_edgengram": {
"tokenizer": "keyword",
"filter": [
"reverse",
"edge",
"reverse"
]
}
},
"filter": {
"edge": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 25
}
}
}
},
"mappings": {
"properties": {
"string_field": {
"type": "text",
"analyzer": "reverse_edgengram"
}
}
}
}
Then you can test it:
POST reverse/_analyze
{
"analyzer": "reverse_edgengram",
"text": "600140"
}
Which yields this:
{
"tokens" : [
{
"token" : "40",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 0
},
{
"token" : "140",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 0
},
{
"token" : "0140",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 0
},
{
"token" : "00140",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 0
},
{
"token" : "600140",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 0
}
]
}
Upvotes: 4