fatih tekin
fatih tekin

Reputation: 989

Why can't csv documents be inserted to SOLR in failsafe way

When there is a line in csv which does not have correct fields, solr does not insert the whole document. Is there any way to tell solr ok skip that line, persist previous lines and continue with next lines after invalid line recursively.

Sample

C:\dev\tools\solr-4.7.2\apache-tomcat-6.0.37\bin>curl "http://localhost:8080/solr-4.7.2/update/csv?commit=true&rowid=id&fieldnames=interfaceSeq_s,extractId_s,country_s,invoiceNumber_s,ori
ginalLineId_s,keyValue_s,levelNumber_s,description_s,chargeGroup_s,chargeSubGroup_s,charge_s,startDateTime_s,endDateTime_s,totalValue_s,billedValue_s,discountValue_s,inclusiveValue_s,unit
OfMeasure_s,attribute1_s,attribute2_s,attribute3_s,attribute4_s,attribute5_s,attribute6_s,attribute7_s,attribute8_s,totalUnits_s,inclusiveUnits_s,billedUnits_s,attribute11_s&skipLines=0&s
eparator=%09&stream.file=C:\opt\invoices\input\5924usage_data1.dat&stream.contentType=text/csv&header=false&trim=true&rowidOffset=123758&literal.recordtype_s=usagedata&literal.filename_s=
5924usage_data1.dat"

Response

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">24</int></lst><lst name="error"><str name="msg">CSVLoader: input=file:/C:/opt/invoices/input/5924usage_data1.dat,
line=2,expected 30 values but got 1
        values={'10000000003',}</str><int name="code">400</int></lst>
</response>

File Content

10000000001     593     FIVE                                639367  5       547674      4   0682791                     Subscription Charges            Communications                   fixe  gsm              2006281745  204623  0.1870          0.1870          0.0000          0.0000          Seconds                         ixed Line -          Mobile                                                                                   Telecom                                                                                       Carges                      31              0               31                                                                                                                                                                                                                                                                                  
10000000002     593     FIVE                                63367   5       547674      4   065050                      Subscription Charges            Communications                   fixe  gsm              2007010929  22952   0.1650          0.1650          0.0000          0.0000          Seconds                         Fixed Line -             Mobile                                                                                  TELECOM                                                                                            Cages                   7               0               7                                                                                                                                                                                                                                                                                   
10000000003

Upvotes: 1

Views: 1023

Answers (1)

fatih tekin
fatih tekin

Reputation: 989

I found the answer. As below code in org.apache.solr.handler.loader.CSVLoaderBase, it is not something configurable in default CSV loader. I had to wirte my own csvrequesthander.

    if (vals.length != fieldnames.length) {
      input_err("expected "+fieldnames.length+" values but got "+vals.length, vals, line);
    }

Upvotes: 2

Related Questions