Lahiru
Lahiru

Reputation: 2664

solr find the most used word after a given word

I need to find the most used word after a given word. For an example collection,

  1. A B
  2. A C
  3. A B
  4. B C

Here the most used word after word A is B. How can I find this in solr?

Upvotes: 0

Views: 70

Answers (1)

MatsLindh
MatsLindh

Reputation: 52912

Create a field with ShingleFilterFactory as one of its filters. This will generate a token sequence for each word when indexing the field, so that A B C is indexed as A B and B C. You will want to use the WhitespaceTokenizer or something similar for the field.

Make a request that searches for field:A\ * (meaning everything starting with the word A) as the query, and add a facet for the field.

facet=true&facet.field=field&facet.limit=10&facet.sort=count

will give you the ten most used sequences that start with the word A.

ShingleFilterFactory defaults to generating shingles with two tokens in each shingle, but you can tune this by altering minShingleSize and maxShingleSize.

Upvotes: 2

Related Questions