Reputation: 989
When there is a line in csv which does not have correct fields, solr does not insert the whole document. Is there any way to tell solr ok skip that line, persist previous lines and continue with next lines after invalid line recursively.
Sample
C:\dev\tools\solr-4.7.2\apache-tomcat-6.0.37\bin>curl "http://localhost:8080/solr-4.7.2/update/csv?commit=true&rowid=id&fieldnames=interfaceSeq_s,extractId_s,country_s,invoiceNumber_s,ori
ginalLineId_s,keyValue_s,levelNumber_s,description_s,chargeGroup_s,chargeSubGroup_s,charge_s,startDateTime_s,endDateTime_s,totalValue_s,billedValue_s,discountValue_s,inclusiveValue_s,unit
OfMeasure_s,attribute1_s,attribute2_s,attribute3_s,attribute4_s,attribute5_s,attribute6_s,attribute7_s,attribute8_s,totalUnits_s,inclusiveUnits_s,billedUnits_s,attribute11_s&skipLines=0&s
eparator=%09&stream.file=C:\opt\invoices\input\5924usage_data1.dat&stream.contentType=text/csv&header=false&trim=true&rowidOffset=123758&literal.recordtype_s=usagedata&literal.filename_s=
5924usage_data1.dat"
Response
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">24</int></lst><lst name="error"><str name="msg">CSVLoader: input=file:/C:/opt/invoices/input/5924usage_data1.dat,
line=2,expected 30 values but got 1
values={'10000000003',}</str><int name="code">400</int></lst>
</response>
File Content
10000000001 593 FIVE 639367 5 547674 4 0682791 Subscription Charges Communications fixe gsm 2006281745 204623 0.1870 0.1870 0.0000 0.0000 Seconds ixed Line - Mobile Telecom Carges 31 0 31
10000000002 593 FIVE 63367 5 547674 4 065050 Subscription Charges Communications fixe gsm 2007010929 22952 0.1650 0.1650 0.0000 0.0000 Seconds Fixed Line - Mobile TELECOM Cages 7 0 7
10000000003
Upvotes: 1
Views: 1023
Reputation: 989
I found the answer. As below code in org.apache.solr.handler.loader.CSVLoaderBase, it is not something configurable in default CSV loader. I had to wirte my own csvrequesthander.
if (vals.length != fieldnames.length) {
input_err("expected "+fieldnames.length+" values but got "+vals.length, vals, line);
}
Upvotes: 2