Reputation: 53
I've setup an instance of SolrCloud. Now I want to index the content in files which are in protocol buffer format and also store them in Solr using stored=true
attribute. Storing binary document is easy. Now how do I go ahead in instructing solr to extract content from protocol buffer files?
I know we can extend ExtractingRequestHandler to do the same but I was unable to find comprehensive documentation on doing the same on the wiki page here : http://wiki.apache.org/solr/ExtractingRequestHandler.
Upvotes: 0
Views: 172
Reputation: 2764
Instead of extending the ExtractingRequestHandler I would go with SolrJ. In that way you can do whatever you want, your client will run in a separate JVM, it will extract content(using your favourite library) and finally it will connect to Solr. Something like this:
// Extract content from PB files
String content = extractContentFromPBFiles();
// The facade towards Solr
SolrClient client = ...
// The Input value object (i.e. a Solr Document that needs to be indexed)
SolrInputDocument doc = new SolrInputDocument();
doc.setField("id", <your id>);
doc.setField("content", content);
// Add
client.add(doc);
// Commit (you may want to avoid this in case of massive inserts)
client.commit();
Upvotes: 1