Dmitrii Volosnykh
Dmitrii Volosnykh

Reputation: 1185

Solr. Store not the original field, but filtered one

I am trying to index Wikipedia's dump. In order to provide abstract for the articles (or, maybe, enable highlighting feature in future) I'd like to store their text without WikiMarkup. For the first try, it would be enough for me to leave just alphanumeric symbols. So the question is it possible to store the field, that is filtered at character level, not the original one?

Upvotes: 3

Views: 1072

Answers (2)

jpountz
jpountz

Reputation: 9964

There is no way to do this out of the box. If you want Solr to do this, you can create your own UpdateHandler, but this might be a little tricky. The easiest way to do this would be to pre-process the document before sending it to Solr.

Upvotes: 2

Paige Cook
Paige Cook

Reputation: 22555

Solr by default stores original field values before the filters are been applied by the index time analyzers for your fieldType. So by default it is not storing the filtered value. However you have two options for getting the result that you want.

  1. You can apply the same filters to the field at query time as are being applied at index time to remove the wiki markup. Please see Analyzers, Tokenizers and Token Filters on the Solr Wiki for more details.
  2. You can apply the filters to the data in a separate process prior to loading the data into Solr, then Solr will store the filtered values, since you will be passing them in already in a filtered state.

Upvotes: 1

Related Questions