d-_-b
d-_-b

Reputation: 23211

SOLR 4 - how to index html plain text

With SOLR 4, how can I index a plain text document with HTML code inside of it, without the HTML being stripped out?

example, <b>bold text</b> is turned into bold text

Thanks!

Upvotes: 1

Views: 1204

Answers (1)

Paige Cook
Paige Cook

Reputation: 22555

Most likely the fieldType for the field where you are storing your text document is implementing the solr.HTMLStripCharFilterFactory. This is removing the <b> </b> tags from your document when it is being stored in the index. You can check this in your schema.xml file.

You will need to modify the fieldType for this field by either removing that CharFilterFactory or defining a new field type that has that removed. For additional information about setting up your schema please refer to following resources.

Upvotes: 2

Related Questions