SaltedPork
SaltedPork

Reputation: 377

Convert TSV to CSV, where TSV field has commas in it

I have a TSV with fields that look like:

name   location   1,2,3,4,5

When I use sed 's/\w/,/g' i end up with a csv where 1,2,3,4 and 5 are considered seperate entrys. I would like it to be '1 2 3 4 5' I've tried converting commas to white space before running the above command using

sed 's/,/\w/g' 

however when converting the whitespace back to commas it includes single white spaces as well as the tabs, so what is the regex for just a single whitespace character?

Desired output:

name, location,1 2 3 4 5,

Upvotes: 0

Views: 1138

Answers (5)

Jeffrey W.
Jeffrey W.

Reputation: 1

sed 's/\t/","/g; s/^\|$/"/g' file doesn't work in MacOS

Instead use sed 's/\t/","/g;s/^/"/;s/$/"/' file for MacOS.

Upvotes: 0

Aaron
Aaron

Reputation: 24812

As mentionned in a comment CSV usually deals with occurences of its separator character in values by enclosing the value in quotes, so I suggest you simply deal with this by enclosing every value in quotes :

sed -E 's/([^\t]*)(\t|$)/"\1",/g'

You can try it here.

This leaves a trailing comma as in your sample output, if you want to avoid it you can use the following :

sed -E 's/\t+$//;s/^/"/;s/\t/","/g;s/$/"/'

If your original data contains " you will however need to escape those, which you can achieve by adding the following substitution before the other(s) :

s/"/\\"/g

As Ed Morton suggests we can also strip the trailing empty fields :

s/\t+$//

In conclusion I'd use the following :

sed -E 's/"/\\"/g;s/\t+$//;s/^/"/;s/\t/","/g;s/$/"/'

which you can try here.

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203532

Depending on your real requirements:

$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) $i="\""$i"\""} 1' file
"name","location","1,2,3,4,5"

$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(OFS," ",$i); $1=$1} 1' file
name,location,1 2 3 4 5

$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(OFS," ",$i); $(NF+1)=""} 1' file
name,location,1 2 3 4 5,

$ echo 'a"b' | awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) { gsub(/"/,"\"\"",$i); $i="\""$i"\"" } } 1'
"a""b"

Upvotes: 0

James Brown
James Brown

Reputation: 37404

And in awk:

$ awk -v OFS="," '{for(i=1;i<=NF;i++)if($i~/,/)$i="\"" $i "\"";$1=$1}1' file
name,location,"1,2,3,4,5"

Explained:

$ awk -v OFS="," '{          # output delimiter to a comma *
    for(i=1;i<=NF;i++)       # loop all fields
        if($i~/,/)           # if comma in field
            $i="\"" $i "\""  # surround with quotes **
    $1=$1                    # rebuild record
}1' file                     # output

* if there is space in the record, consider input field separator to a tab with awk -F"\t".

** also, if there are quotes in the fields with commas, maybe they should be duplicated or escaped.

Upvotes: 1

oguz ismail
oguz ismail

Reputation: 50750

Either replace tabs with "," and enclose lines between double quotes, or replace commas with spaces and tabs with commas. In both cases you'll get valid CSV.

$ cat file
name    location        1,2,3,4,5
$
$ sed 's/\t/","/g; s/^\|$/"/g' file
"name","location","1,2,3,4,5"
$
$ sed 's/,/ /g; s/\t/,/g' file
name,location,1 2 3 4 5

Upvotes: 1

Related Questions