Reputation: 2252
I am converting a SOLR 4.10 db to SOLR 7.1
In 4.10, I have a field that is a phone number (here's the schema information for the field):
<field name="Phone" type="string" indexed="false" stored="true"/>
When inserting documents into SOLR, there are some documents where the value of Phone is an empty string or a single blank space.
When running a query against SOLR 4.10, the documents returned that have an empty or single space in Phone, include the phone field in the documents:
...
"FirstName":"Bob, No Phone",
"Phone":"",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"Phone":""
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...
But when these same rows are inserted into SOLR 7.1, the documents returned for those rows have no Phone field
...
"FirstName":"Bob, No Phone",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...
See how how Donald has a phone number possibly because there was a "non-blank" phone number.
Is this something that has been added since 4.10?
Is there a schema setting or SOLRConfig.xml setting that can turn the 4.10 behavior back on?
UPDATE
I also looked at the version of Java installed on the two boxes - the SOLR 4.10 box has java 1.8.0_161, and the SOLR 7.1 box has java 1.8.0_40. I wouldnt think the java version difference would cause that - I believe SOLR just requires 1.8.
Upvotes: 1
Views: 397
Reputation: 2252
I fixed it.
When migrating I created a new 7.1 core, which created a new SOLRConfig.XML, and then I brought over configuration from the 4.10 core.
The default SOLRConfig.xml in 7.1 contained an updateRequestProcessorChain which used RemoveBlankFieldUpdateProcessorFactory.
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date">
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
This chain appears to be for schemaless, so I commented out the entire updateRequestProcessorChain, and the issue I was experiencing disappeared.
Upvotes: 2