Reputation: 377
I have a TSV with fields that look like:
name location 1,2,3,4,5
When I use sed 's/\w/,/g'
i end up with a csv where 1,2,3,4 and 5 are considered seperate entrys.
I would like it to be '1 2 3 4 5'
I've tried converting commas to white space before running the above command using
sed 's/,/\w/g'
however when converting the whitespace back to commas it includes single white spaces as well as the tabs, so what is the regex for just a single whitespace character?
Desired output:
name, location,1 2 3 4 5,
Upvotes: 0
Views: 1138
Reputation: 1
sed 's/\t/","/g; s/^\|$/"/g' file
doesn't work in MacOS
Instead use
sed 's/\t/","/g;s/^/"/;s/$/"/' file
for MacOS.
Upvotes: 0
Reputation: 24812
As mentionned in a comment CSV usually deals with occurences of its separator character in values by enclosing the value in quotes, so I suggest you simply deal with this by enclosing every value in quotes :
sed -E 's/([^\t]*)(\t|$)/"\1",/g'
You can try it here.
This leaves a trailing comma as in your sample output, if you want to avoid it you can use the following :
sed -E 's/\t+$//;s/^/"/;s/\t/","/g;s/$/"/'
If your original data contains "
you will however need to escape those, which you can achieve by adding the following substitution before the other(s) :
s/"/\\"/g
As Ed Morton suggests we can also strip the trailing empty fields :
s/\t+$//
In conclusion I'd use the following :
sed -E 's/"/\\"/g;s/\t+$//;s/^/"/;s/\t/","/g;s/$/"/'
which you can try here.
Upvotes: 2
Reputation: 203532
Depending on your real requirements:
$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) $i="\""$i"\""} 1' file
"name","location","1,2,3,4,5"
$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(OFS," ",$i); $1=$1} 1' file
name,location,1 2 3 4 5
$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(OFS," ",$i); $(NF+1)=""} 1' file
name,location,1 2 3 4 5,
$ echo 'a"b' | awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) { gsub(/"/,"\"\"",$i); $i="\""$i"\"" } } 1'
"a""b"
Upvotes: 0
Reputation: 37404
And in awk:
$ awk -v OFS="," '{for(i=1;i<=NF;i++)if($i~/,/)$i="\"" $i "\"";$1=$1}1' file
name,location,"1,2,3,4,5"
Explained:
$ awk -v OFS="," '{ # output delimiter to a comma *
for(i=1;i<=NF;i++) # loop all fields
if($i~/,/) # if comma in field
$i="\"" $i "\"" # surround with quotes **
$1=$1 # rebuild record
}1' file # output
* if there is space in the record, consider input field separator to a tab with awk -F"\t"
.
** also, if there are quotes in the fields with commas, maybe they should be duplicated or escaped.
Upvotes: 1
Reputation: 50750
Either replace tabs with ","
and enclose lines between double quotes, or replace commas with spaces and tabs with commas. In both cases you'll get valid CSV.
$ cat file
name location 1,2,3,4,5
$
$ sed 's/\t/","/g; s/^\|$/"/g' file
"name","location","1,2,3,4,5"
$
$ sed 's/,/ /g; s/\t/,/g' file
name,location,1 2 3 4 5
Upvotes: 1