Reputation: 2664
I need to find the most used word after a given word. For an example collection,
Here the most used word after word A is B. How can I find this in solr?
Upvotes: 0
Views: 70
Reputation: 52912
Create a field with ShingleFilterFactory as one of its filters. This will generate a token sequence for each word when indexing the field, so that A B C
is indexed as A B
and B C
. You will want to use the WhitespaceTokenizer or something similar for the field.
Make a request that searches for field:A\ *
(meaning everything starting with the word A
) as the query, and add a facet for the field.
facet=true&facet.field=field&facet.limit=10&facet.sort=count
will give you the ten most used sequences that start with the word A
.
ShingleFilterFactory defaults to generating shingles with two tokens in each shingle, but you can tune this by altering minShingleSize
and maxShingleSize
.
Upvotes: 2