Aaron
Aaron

Reputation: 1455

Convert CSV to TSV

How do you convert this csv file into a tab delimited file?

"Country","Percent","Percent of patients","home health","home health agency","friends and family","Surveys","Response"
"Nation","88","85","83","84","78",,

Notice both the Surverys and Response columns are empty strings.

I use this code to convert it to a tab file -

sed 's/\"\,\"/\"\t\"/g'
sed 's/\,\,/\t""\t/g'

It doesn't convert the last column though. This is the output I get (Notice the last column is omitted) -

"Country"   "Percent"   "Percent of patients"   "home health"   "home health agency"    "friends and family"    "Surveys"   "Response"
"Nation"        "88"    "85"    "83"    "84"    "78"    ""

There are 8 columns in the header and only 7 columns in the tab delimited data, so the last column is missed.

UPDATE

My column names have commas in them.

Upvotes: 1

Views: 3295

Answers (3)

Cyrus
Cyrus

Reputation: 88674

With GNU awk.

awk 'BEGIN{FS="\",\""; OFS="\t"} {FS=","; for(i=1; i<=NF; i++) {gsub(/"/,"",$i); $i="\"" $i "\""} print}' file

Output:

"Country"      "Percent"       "Percent of patients"   "home health"   "home health agency"    "friends and family"       "Surveys"       "Response"
"Nation"        "88"    "85"    "83"    "84"    "78"    ""      ""

Upvotes: 0

mivk
mivk

Reputation: 14919

Actually, your last column is not missing at all. You just don't see it because it's a tab. You can check it with xxd.

Also, you don't need to escape commas and double-quotes in sed since you have everything inside single quotes.

sed 's/","/"\t"/g; s/,,/\t""\t/g;' $YOUR_CSV | xxd | tail -1.

This shows that the last line ends with a tab (x09) after the last quote and before the ending newline :

00000090: 3834 2209 2237 3822 0922 2209 0a         84"."78".""..

What is "missing" is the double-quotes around that last empty value. But you don't need them. If you really want these double-quotes, you can add this to your sed command:

s/\t$/\t""/

It will replace a tab at the end of a line with a tab followed by 2 double-quotes.

However, this simple sed substitution can of course easily fail depending on your data.

For example (using - instead of \t for visibility), if you have empty columns which are not at the end:

echo '"Nation","88",,,"84","78",,' | sed 's/","/"-"/g; s/,,/-""-/g;'

Will output

"Nation"-"88"-""-,"84"-"78"-""-

(notice the comma before "84")

So I would suggest using a dedicated tool rather than a quick sed line. For example, csvtool is available in most distributions (sudo apt install csvtool for Debian based).

csvtool -t COMMA -u TAB cat $YOUR_CSV

Upvotes: 1

Jotne
Jotne

Reputation: 41454

I do use FPAT with gnu awk to handle CSV files

awk -v FPAT='([^,]+)|("[^"]+")' -v OFS='\t' '{$1=$1}1' file
"Country"       "Percent"       "Percent of patients"   "home health"   "home health agency"    "friends and family"    "Surveys"       "Response"
"Nation"        "88"    "85"    "83"    "84"    "78"

Not sure what you mean by last column is missing. If its empty ,, you will not see it.

Upvotes: 0

Related Questions