Roman
Roman

Reputation: 10403

How can to group lucene's results?

My application indexes discussion threads. Each entry in the discussion is indexed as a separate Lucene document with a common_id field which can be used to group search hits into one discussion.

Currently when the search is performed, if a thread has 3 entries, then 3 separate hits are returned. Even though this is correct, from the users point of view the same entry is appearing in the search multiple times.

Is there a way to tell lucene to group it's search results by the common_id field before returning them?

Upvotes: 0

Views: 1994

Answers (3)

Yossi Vainshtein
Yossi Vainshtein

Reputation: 3985

Since version 3.2 lucene supports grouping search results based on a field. http://lucene.apache.org/core/4_1_0/grouping/org/apache/lucene/search/grouping/package-summary.html

Upvotes: 0

Yuval F
Yuval F

Reputation: 20621

I believe what you are asking for is Field Collapsing, which is a feature of Solr (and I believe Elasticsearch as well).

If you want to roll your own, One possible way to do this is:

  1. Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
  2. Make an initial query to Lucene, and get a hit list.
  3. For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.

An alternative is to store the ids of all the series members in a field inside each member's document.

Upvotes: 1

bajafresh4life
bajafresh4life

Reputation: 12853

There is nothing built into Lucene that collapses results based on a field. You will need to implement that yourself.

However, they've recently built this feature into Solr.

See http://www.lucidimagination.com/blog/2010/09/16/2446/

Upvotes: 0

Related Questions