Reputation: 427
I have a CSV file that contains 110 rows with an extra semicolon character, which interrupts the read processing of the file. The extra semicolon delimiter occurs in the 3rd position in these lines.
I have the following code to find the 110 rows, but how can I expand on it to remove the 3rd semicolon in these lines?
awk -F \; 'NF != 14' file.csv
Example input:
;1000;"First; Name";132;GB; ... ;Field14;
;1000;(Thank; You-);126;IR; ... ;Field14;
Wanted output:
;1000;"First Name";132;GB; ... ;Field14;
;1000;(Thank You-);126;IR; ... ;Field14;
Where the semicolon inside the field containing "First; Name" and (Thank; You-) has been removed
Upvotes: 1
Views: 259
Reputation: 37404
And the GNU awk FPAT
version:
$ gawk '
BEGIN {
FPAT = "([^;]*)|(\"[^\"]+\")"
OFS=";"
}
{
gsub(/;/,"",$3)
}1' file
Output:
;1000;"First Name";132;GB; ... ;Field14;
More generically for each field:
$ gawk '
BEGIN {
FPAT = "([^;]*)|(\"[^\"]+\")"
OFS=";"
}
{
for(i=1;i<=NF;i++)
gsub(/;/,"",$i)
}1' file
Upvotes: 4
Reputation: 67467
another approach is treating quotes as the delimiters and removing the semicolons between quotes
$ awk 'BEGIN{FS=OFS="\""} {for(i=2;i<=NF;i+=2) gsub(/;/,"",$i)}1' file
;1000;"First Name";132;GB; ... ;Field14;
this may not work if you have escaped quote signs in quoted fields.
Upvotes: 3
Reputation: 246774
I strongly recommend using CSV-specific tools on that CSV data. For example, with csvkit
csvformat -d ';' -D , file | tr -d ';' | csvformat -d , -D ';'
;1000;First Name;132;GB; ... ;Field14;
That converts the delimiters from semicolon to comma, deletes all semicolons, then replaces the delimiters back to semicolons.
The double quotes got removed in the process, but that's OK because the field no longer contains the delimiter character.
Or if you have ruby installed:
ruby -rcsv -e 'CSV.filter(col_sep: ";") {|row| row[2].delete!(";")}' file
Upvotes: 1
Reputation: 133458
With your shown samples, could you please try following. Written and tested in GNU awk
.
awk '
match($0,/"[^;]*[^"]*/){
val=substr($0,RSTART,RLENGTH)
gsub(/;/,"",val)
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
}
' Input_file
Upvotes: 3
Reputation: 12867
Using sed:
sed -ri 's/(^.*\;.*\;\".*)(\;)(.*\"\;.*$)/\1\3/' file
Enable regular expression interpretation with -E or -r and then split the lines into three sections based on regular expressions, the first being the first 2 ";" delimited fields, and then the third up to a ";", the second section being the ";" and the third section the rest of the line. Replace the line with the first and third sections only.
Upvotes: 2