Delimiter within field value SQL Server

Question

I have a csv file which has over 20 million rows, the delimiter is the vertical bar. The issue is that there is a text column in the file which include also vertical bars within the texts, this messes up the data and cause the column to shift to the next one when importing the csv file in the SQL Server.

The file is too big to handle, e.g. if we want to add a qualifier or change the delimiter type using even fancy text editors.

Any idea? Ideally, any general solution for issues like this? Sometimes although you are using qualifiers, there might be text fields containing qualifier-like strings, delimiters, etc..

The fields are not quoted. The rows look simply like this:

field1|field2|field3|field4  
1|000|some text|some text  
2|001|some text con|taining pipe|some text  
3|002|some text|some text

David דודו Markovitz · Accepted Answer

With access to bash (Linux/Unix/Cygwin etc.)

In order to estimate the severity of the issue, check the number of records with 4 fields and with other numbers of fields .

awk -F'|' '{rec[NF==4?"NF=4":"NF!=4"]++}END{for(nf in rec){print nf,rec[nf]}}' MyFile.csv

Generate a file with the good records and load it.

awk -F'|' 'NF==4{print}' MyFile.csv > MyFile_good.csv

Generate a file with the bad records and check if you can fix it manually or some other way (If you identify patterns)

awk -F'|' 'NF!=4{print}' MyFile.csv > MyFile_bad.csv

Support for qualifiers

"1"|"000"|"some text"|"some text"  
"2"|"001"|"some text con|taining pipe"|"some text"  
"3"|"002"|"some text"|"some text"

Instead of defining a separator (awk -F'|') we are now defining how a qualified field looks like (FPAT="\"[^\"]*\"")

awk 'BEGIN{OFS="	";FPAT="\"[^\"]*\""}{rec[NF==4?"NF=4":"NF!=4"]++}END{for(nf in rec){print nf,rec[nf]}}' MyFile.csv

Delimiter within field value SQL Server

Answers (2)

Support for qualifiers

Related Questions