Rick Hodder
Rick Hodder

Reputation: 2252

SOLR 7.1 Leaving empty fields out of query results

I am converting a SOLR 4.10 db to SOLR 7.1

In 4.10, I have a field that is a phone number (here's the schema information for the field):

<field name="Phone" type="string" indexed="false" stored="true"/>

When inserting documents into SOLR, there are some documents where the value of Phone is an empty string or a single blank space.

When running a query against SOLR 4.10, the documents returned that have an empty or single space in Phone, include the phone field in the documents:

...
"FirstName":"Bob, No Phone",
"Phone":"",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"Phone":""
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...

But when these same rows are inserted into SOLR 7.1, the documents returned for those rows have no Phone field

...
"FirstName":"Bob, No Phone",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...

See how how Donald has a phone number possibly because there was a "non-blank" phone number.

Is this something that has been added since 4.10?

Is there a schema setting or SOLRConfig.xml setting that can turn the 4.10 behavior back on?

UPDATE

I also looked at the version of Java installed on the two boxes - the SOLR 4.10 box has java 1.8.0_161, and the SOLR 7.1 box has java 1.8.0_40. I wouldnt think the java version difference would cause that - I believe SOLR just requires 1.8.

Upvotes: 1

Views: 397

Answers (1)

Rick Hodder
Rick Hodder

Reputation: 2252

I fixed it.

When migrating I created a new 7.1 core, which created a new SOLRConfig.XML, and then I brought over configuration from the 4.10 core.

The default SOLRConfig.xml in 7.1 contained an updateRequestProcessorChain which used RemoveBlankFieldUpdateProcessorFactory.

 <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
           processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date">
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.DistributedUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>

This chain appears to be for schemaless, so I commented out the entire updateRequestProcessorChain, and the issue I was experiencing disappeared.

Upvotes: 2

Related Questions