Reputation: 15219
I'm a bit stumped about how to add facets to an already existing Lucene index.
I have a Lucene index (created without any facets) created using Lucene 3.1.
I've looked over the Lucene documentation for facets, and there they show you how to create from scratch an index with facets, i.e. you create a new Lucene Document
object, use the taxonomy tools to add facet information to it (categories) and then write that document in the Lucene index (using IndexWriter
) and this will also add extra data to the taxonomy index (via TaxonomyWriter
), as described here:
However, what I want is to use the data already stored in the existing Lucene index, and from it create a new Lucene index, (with taxonomy index alongside it) that will contain the exact same data as the original index, plus the various category information.
My question is more precisely:
Is it enough to read a document from the original index, create its CategoryPath, and then write it to the new index, like this:
//get a document from original Lucene index:
Query query = queryParser.parse("*:*");
originalTopDocs = originalIndexSearcher.search(query,100);
Document originalDocument = originalIndexSearcher.doc(originalTopDocs.scoreDocs[1].doc)
//create categories for original document
CategoryDocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxonomyWriter);
categoryDocBuilder.setCategoryPaths(categoriesPaths);
//create new document from original document + categories:
Document originalDocumentWithCategories = categoryDocBuilder.build(originalDocument);
//write new document to new index:
newIndexWriter.write(originalDocumentWithCategories);
Does the above code index the same document as it was stored in the original index, but with added categories data? For example, will the data for the non-stored fields from the original document still be present in the newly created and indexed document?
Also is there a better way to do this update (maybe not create a new index)...
Upvotes: 3
Views: 1299
Reputation: 15219
OK well, here's some insights on how I solved this:
If you wanna do it with Lucene-only (as described in the question), you can only do that if :
All this being said, I've noticed that, at least for the facet part, it's easier to implement using Solr, and, at least for my situation the performances do not degrade, but are in fact better sometimes. The advantage with Solr is that it creates facets "auto magically" (on all the fields that are pertinent to facetting). No extra facet indexing, no manual declaration of facet "paths" etc. And the Solr query API for facets is friendlier than the Lucene one as well.
Problems you might get when migrating from Lucene to Solr are :
new SpanQuery("My blue boat*")
and auto magically have the correct query terms created behind the scenes). If you want to translate Lucene queries which make heavy use of said programmatic query API to Solr queries, you have to make your own tools that generate the corresponding Lucene query string. You can of course still build the query objects using Lucene API, and then do a toString()
on them before sending them to Solr, but this doesn't work all the time, and can get really complicated for certain, complex, queries.Upvotes: 1