Reputation: 9441
given input "quick brown fox jumped", I would like to tokenize to
["quick brown", "brown fox" "fox jumped"]
But the tokenizers don't seem to offer this feature. This feels like it should be a common feature so I'm guessing im missing something obvious.
I can do ngrams, which allows like of
['q', 'qu', 'qui', 'quic', 'quick']
But I would like to get the combinations for words, rather than letters.
Is this supported?
Ps, the reason I would like to do this is to do suggesting next word - similarly to how google suggests the next word to use. I intend to use this tokenizer with phrase suggestor.
Upvotes: 2
Views: 384