Reputation: 21
I want to parse a csv file using the solr handler. The problem is that my file might contain problematic lines (those lines can contain unescaped encaptulators). When Solr finds one such line, fails with the following message and stops
<str name="msg">CSVLoader: input=null, line=1941,can't read line: 1941
values={NO LINES AVAILABLE}</str><int name="code">400</int>
I understand that in that case the parser cannot fix the problematic line and this ok for me.I just want to skip the faulty line and continue with the rest of the file.
I tried using the TolerantUpdateProcessorFactory in my processor chain but the result was the same.
I use solr 6.5.1 and the curl command that I try is something like that
curl '<path>/update?update.chain=tolerant&maxErrors=10&commit=true&fieldnames=<my fields are provided>,&skipLines=1' --data-binary @my_file.csv -H 'Content-type:application/csv'
Finally this is what I put in my solrconfig.xml
<updateRequestProcessorChain name="tolerant">
<processor class="solr.TolerantUpdateProcessorFactory">
<int name="maxErrors">10</int>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Upvotes: 1
Views: 941
Reputation: 1573
I would suggest that you pre-process and clean the data using the using the UpdateRequestProcessors.
This is a mechanism to transform the documents that is submitted to Solr for indexing.
Read more about UpdateRequestProocessors
Upvotes: 1