Rich Tier
Rich Tier

Reputation: 9441

elasticsearch tokenize into word pairs

given input "quick brown fox jumped", I would like to tokenize to

["quick brown", "brown fox" "fox jumped"]

But the tokenizers don't seem to offer this feature. This feels like it should be a common feature so I'm guessing im missing something obvious.

I can do ngrams, which allows like of

['q', 'qu', 'qui', 'quic', 'quick']

But I would like to get the combinations for words, rather than letters.

Is this supported?

Ps, the reason I would like to do this is to do suggesting next word - similarly to how google suggests the next word to use. I intend to use this tokenizer with phrase suggestor.

Upvotes: 2

Views: 384

Answers (1)

Rich Tier
Rich Tier

Reputation: 9441

Ah it turns out I want shingles.

Upvotes: 4

Related Questions