br-
br-

Reputation: 53

How do I extend Apache Solr's ExtractingRequestHandler to extract content from Protocol Buffer files?

I've setup an instance of SolrCloud. Now I want to index the content in files which are in protocol buffer format and also store them in Solr using stored=true attribute. Storing binary document is easy. Now how do I go ahead in instructing solr to extract content from protocol buffer files? I know we can extend ExtractingRequestHandler to do the same but I was unable to find comprehensive documentation on doing the same on the wiki page here : http://wiki.apache.org/solr/ExtractingRequestHandler.

Upvotes: 0

Views: 172

Answers (1)

Andrea
Andrea

Reputation: 2764

Instead of extending the ExtractingRequestHandler I would go with SolrJ. In that way you can do whatever you want, your client will run in a separate JVM, it will extract content(using your favourite library) and finally it will connect to Solr. Something like this:

// Extract content from PB files
String content = extractContentFromPBFiles();

// The facade towards Solr
SolrClient client = ...

// The Input value object (i.e. a Solr Document that needs to be indexed)
SolrInputDocument doc = new SolrInputDocument();
doc.setField("id", <your id>);
doc.setField("content", content);

// Add
client.add(doc);

// Commit (you may want to avoid this in case of massive inserts)
client.commit();

Upvotes: 1

Related Questions