bullfighter
bullfighter

Reputation: 427

AWK - replace extra semicolon separators in a csv-file

I have a CSV file that contains 110 rows with an extra semicolon character, which interrupts the read processing of the file. The extra semicolon delimiter occurs in the 3rd position in these lines.

I have the following code to find the 110 rows, but how can I expand on it to remove the 3rd semicolon in these lines?

awk -F \; 'NF != 14' file.csv

Example input:

;1000;"First; Name";132;GB; ... ;Field14;
;1000;(Thank; You-);126;IR; ... ;Field14;

Wanted output:

;1000;"First Name";132;GB; ... ;Field14;
;1000;(Thank You-);126;IR; ... ;Field14;

Where the semicolon inside the field containing "First; Name" and (Thank; You-) has been removed

Upvotes: 1

Views: 259

Answers (5)

James Brown
James Brown

Reputation: 37404

And the GNU awk FPAT version:

$ gawk '
BEGIN {
    FPAT = "([^;]*)|(\"[^\"]+\")"
    OFS=";"
}
{
    gsub(/;/,"",$3)
}1' file

Output:

;1000;"First Name";132;GB; ... ;Field14;

More generically for each field:

$ gawk '
BEGIN {
    FPAT = "([^;]*)|(\"[^\"]+\")"
    OFS=";"
}
{
    for(i=1;i<=NF;i++)
        gsub(/;/,"",$i)
}1' file

Upvotes: 4

karakfa
karakfa

Reputation: 67467

another approach is treating quotes as the delimiters and removing the semicolons between quotes

$ awk 'BEGIN{FS=OFS="\""} {for(i=2;i<=NF;i+=2) gsub(/;/,"",$i)}1' file

;1000;"First Name";132;GB; ... ;Field14;

this may not work if you have escaped quote signs in quoted fields.

Upvotes: 3

glenn jackman
glenn jackman

Reputation: 246774

I strongly recommend using CSV-specific tools on that CSV data. For example, with csvkit

csvformat -d ';' -D , file | tr -d ';' | csvformat -d , -D ';'
;1000;First Name;132;GB; ... ;Field14;

That converts the delimiters from semicolon to comma, deletes all semicolons, then replaces the delimiters back to semicolons.

The double quotes got removed in the process, but that's OK because the field no longer contains the delimiter character.


Or if you have installed:

ruby -rcsv -e 'CSV.filter(col_sep: ";") {|row| row[2].delete!(";")}' file

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples, could you please try following. Written and tested in GNU awk.

awk '
match($0,/"[^;]*[^"]*/){
  val=substr($0,RSTART,RLENGTH)
  gsub(/;/,"",val)
  print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
}
'  Input_file

Upvotes: 3

Raman Sailopal
Raman Sailopal

Reputation: 12867

Using sed:

sed -ri 's/(^.*\;.*\;\".*)(\;)(.*\"\;.*$)/\1\3/' file

Enable regular expression interpretation with -E or -r and then split the lines into three sections based on regular expressions, the first being the first 2 ";" delimited fields, and then the third up to a ";", the second section being the ";" and the third section the rest of the line. Replace the line with the first and third sections only.

Upvotes: 2

Related Questions