Reputation: 115
I have a Solr Server with a managed schema. The data that gets stored in it does however not always abide the defined types. The problem here is that cleaning the data properly would take way too long for the amount of data that is inserted. Therefore I was thinking that it would be awesome to just drop a field that does not have the correct value type, instead of just stopping the update operation.
For Example: I have this schema:
<field name="someInt" type="pint" indexed="true" stored="true" multiValued="false" docValues="true" />
<field name="someOtherInt" type="pint" indexed="true" stored="true" multiValued="false" docValues="true" />
Then I insert documents using api/collections/myCollection/update:
[
{
"someInt":0,
"someOtherInt":0
},
{
"someInt":1,
"someOtherInt":"This is not an int"
},
{
"someInt":2,
"someOtherInt":2
},
]
This will result in the first object being indexed and solr reporting: Error adding field 'someOtherInt'='This is not an int' msg=For input string: "This is not an int" for the second object. The third one won't be touched.
What I would like to have indexed is effectively:
[
{
"someInt":0,
"someOtherInt":0
},
{
"someInt":1,
},
{
"someInt":2,
"someOtherInt":2
},
]
Additonally a Error Message in the Log would be great, but mainly I want all documents to be submitted and wrong fields being dropped.
I was not able to find anything in the Documentation that could help me accomplish this. Any help would be greatly appreciated.
Upvotes: 1
Views: 20