Ahsan Iqbal
Ahsan Iqbal

Reputation: 1432

How Grouping can be achieved in Solrnet/Solr(Lucene)?

I have Lucene files indexed according to pageIds (UniqueKey). and one document can have multiple pages. Now once user perform some search it gives us pages that matches search criteria.

I am using Lucene.Net 2.9.2

We have 2 problems...

1- The file size is around 800GB and it has 130 million rows (pages) so the search time was really slow (all queries taking more than a min (we only have to return limited rows at a time)

To overcome the performance issue I shifted to SOLR which resolved the performance issue (which is quite strange as I am not using any extra functionality provided by SOLR like sharding etc - so could it be that Lucene.NET 2.9.2 is not really equivalent to performance compared to same version of JAVA??) but now I am having another issue...

2- The individual 'lucene document' is one page but i want to show results 'grouped by' 'real documents'. How many results I should be returned should be configurable based on 'real documents' not 'pages' (coz thats how I want to show to the user).

So lets say I want 20 'real documents' and ALL pages in them that matches the search criteria (doesnt matter if one document has 100 pages and another just 1).

From what I could get from SOLR forums was that it can be achieved by SOLR-236 patch (field collapsing) but I have not been able to apply the patch correctly with trunk (gives lots of errors).

This is really imp for me and I dont have much time, so can someone please either send me the SOLR 1.4.1 binary with this patch applied or guide me if there is any other way.

I would really appreciate it. Thanks!!

Upvotes: 1

Views: 652

Answers (3)

morja
morja

Reputation: 8560

You could also have a look at SOLR-1682 : Implement CollapseComponent, I havent tested it yet, but as far as I know it solves the collapsing too.

Upvotes: 0

Jokin
Jokin

Reputation: 4218

If your only requirement is to show page numbers, I would suggest to play with the highlighter or made some custom development. You can store the word number of the start and end of each page in a custom structure, and knowing the matched word position in the whole document you can know in what page it appears. If the documents are very large you will get a good performance improvement.

Upvotes: 0

Mauricio Scheffer
Mauricio Scheffer

Reputation: 99750

If you have issues with the collapse patch, then the Solr issue tracker is the channel to report them. I can see that other people are currently having some issues with it, so I suggest getting involved in its development.

That said: I recommend that if your application needs to search for 'real documents', then build your index around these 'real documents', not their individual pages.

Upvotes: 0

Related Questions