user6732360
user6732360

Reputation:

Custom Solr analyzers not being used during indexing

I have a bunch of PDF files on my machine which I want to index in Solr. For this purpose, I have created a schema file with custom field types and user-defined fields.

Given below are the fields and copyFields in my schema.xml:

<field name="id" type="custom01" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="custom02" indexed="true" stored="true" multiValued="true"/>
<field name="fileEx" type="custom03" indexed="false" stored="true" multiValued="false"/>

<copyField source="id" dest="fileEx"/>

The id field will contain the actual path of the indexed file. I plan to copy this value into fileEx and save just the extension of the file in the field using the custom analyzer as given in the field definition.

The following are my custom fieldType definitions:

<fieldType name="custom01" class="solr.TextField"> <!-- Dummy fieldType -->
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="^$"/>
</analyzer>
</fieldType>

<fieldType name="custom02" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\.([^.]*$)" group="0"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="\." replacement=""/>
</analyzer>
</fieldType>

When I tried to index the files using this schema, the contents of the id field were just copied into fileEx without any analyzing done. Both id and fileEx had the same value. I used the analyzer tab in the SolrUI to see if my fieldTypes actually work and found that they work as expected.

But for some reason, the analyzers don't seem to be running properly while indexing actual documents.

So, at this point I am stuck and frustrated. Any help regarding this will be much appreciated. TIA.

Upvotes: 0

Views: 359

Answers (1)

MatsLindh
MatsLindh

Reputation: 52822

Do I understand correctly that you're asking why the text returned from a hit hasn't changed? The text returned is the value before processing, not the tokenized contents of the field. You will not see any change in the value returned by changing the analyzer. This is required to make things like highlighting work properly.

If you want to change the text before it arrives in a field, use an update processor.

Upvotes: 2

Related Questions