Jackrabbit deprecated SearchIndex textFilterClasses attribute

Question

I'm configuring Jackrabbit 2.3.6 and I need to index binary files (PDF, ODT). So I've configured SearchIndex in repository.xml according to http://wiki.apache.org/jackrabbit/Search. But when I insert file into repository and try to full-text search, no results are returned.

Then I noticed warning in logs:

SearchIndex.java:2087 The textFilterClasses configuration parameter has been deprecated, and the configured value will be ignored: org.apache.jackrabbit.extractor.PlainTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor

How do I have to configure SearchIndex to index binary data? Now I am doing it like this, which is deprecated and didn't work according to aforementioned warning:

Thanks for replies.

RobSis · Accepted Answer

This is the answer to similar question from Mark Herman from Jackrabbit Users mailing list:

I'm not an expert but what I do know that JR uses Tika to extract text, and it determines how based on the jcr:mimeType property. If you don't supply mimetype, then it won't know how to extract it (although I wouldn't recommend that as a practice). I believe there is a way to supply JR with a Tika config that might give you what you want. EDIT: There isn't. It's hardcoded.

Additionally you can specify a indexing config in the repository/workspace xml files that you can set some rules on what gets indexed and how by lucene.

Jackrabbit deprecated SearchIndex textFilterClasses attribute

Answers (2)

Related Questions