Reputation:
I have a bunch of PDF files on my machine which I want to index in Solr. For this purpose, I have created a schema file with custom field types and user-defined fields.
Given below are the fields and copyFields in my schema.xml
:
<field name="id" type="custom01" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="custom02" indexed="true" stored="true" multiValued="true"/>
<field name="fileEx" type="custom03" indexed="false" stored="true" multiValued="false"/>
<copyField source="id" dest="fileEx"/>
The id
field will contain the actual path of the indexed file. I plan to copy this value into fileEx
and save just the extension of the file in the field using the custom analyzer as given in the field definition.
The following are my custom fieldType definitions:
<fieldType name="custom01" class="solr.TextField"> <!-- Dummy fieldType -->
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="^$"/>
</analyzer>
</fieldType>
<fieldType name="custom02" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\.([^.]*$)" group="0"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="\." replacement=""/>
</analyzer>
</fieldType>
When I tried to index the files using this schema, the contents of the id
field were just copied into fileEx
without any analyzing done. Both id
and fileEx
had the same value. I used the analyzer tab in the SolrUI to see if my fieldTypes actually work and found that they work as expected.
But for some reason, the analyzers don't seem to be running properly while indexing actual documents.
So, at this point I am stuck and frustrated. Any help regarding this will be much appreciated. TIA.
Upvotes: 0
Views: 359
Reputation: 52822
Do I understand correctly that you're asking why the text returned from a hit hasn't changed? The text returned is the value before processing, not the tokenized contents of the field. You will not see any change in the value returned by changing the analyzer. This is required to make things like highlighting work properly.
If you want to change the text before it arrives in a field, use an update processor.
Upvotes: 2