Lucene.net searching: return results with unique field value

Question

I'm writing a basic Lucene.Net application to index what are essentially forum posts. To simplify, each Post document has a URL and some Content. For each given thread I'm indexing each Post as a separate document (indexing whole threads as single documents returns too many false positives when searching).

The problem I'm having is dealing with multiple Post documents having the same URL in my results sets. When I search and return 10 results, I want each result to refer to a different URL.

Currently, I have something along the lines of the following:

// setup
StandardAnalyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
FSDirectory directory = FSDirectory.Open(indexLocation);
IndexSearcher searcher = new IndexSearcher(directory);
parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "body", analyzer);

// search
Query query = parser.Parse(queryString);
TopDocs topDocs = searcher.Search(query, null, 10);

However, of the ten results returned, there may only be 7 unique URLs. I've looked at discarding these duplicates before searching again, returning a larger results set and discarding the first 10 (similar to pagination) until I have 10 unique URLs, but this raises questions such as when should I stop because there are no more results? etc.

It feels like there should be a way of filtering at the TopDocs topDocs = searcher.Search() point, to return 10 results which have unique URLs. I can't find anything around this (perhaps I'm not using the correct terminology) but I'm sure a lot of other applications must have solved this before... Does anything like this already exist, or can anyone offer pointers as to how to go about implementing this?

Lucene.net searching: return results with unique field value

Answers (1)

Related Questions