Why are position, end_offset, start_offset messed up when using self made Tokenizer?

Question

I wrote my own tokenizer: https://github.com/AdiGabaie/tokenizer

I create an analyzer with this tokenizer.

When I test the analyzer, I see the tokens and the "start_offset" and "end_offset" are 0 for all tokens And position is 1 for all.

If I remove the 'autocomplete_filter' the position is ok (1,2,3...) but 'start_offset' and 'end_offset' are still 0.

I guess I should do something in my tokenizer implementation to fix it?

PUT /aditryings/
{
    "settings": {
        "index" : {
            "analysis" : { 
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "phrase_tokenizer",
                        "filter" : ["lowercase","autocomplete_filter"]
                    }
                },
                "filter" : {
                    "autocomplete_filter": {
                        "type": "edge_ngram",
                        "min_gram": 1,
                        "max_gram": 20
                    }
                }
            }
        }
    }, 
    "mappings" : {
        "productes" : {
            "properties" : {
                "id" : { "type" : "long"},
                "productName" : { "type" : "string", "index" : "analyzed", "analyzer": "my_analyzer"}
            }
        }
    }
}

Why are position, end_offset, start_offset messed up when using self made Tokenizer?

Answers (1)

Related Questions