Reputation: 49
My CSV file has records in the following format:
571283,1,"R","01/15/2002","IBMS,SL"
I want to convert them to the following format:
571283|1|R|01/15/2002|IBMS,SL
I tried this:
sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' test.csv
but the output I am getting is:
571283,1|R|01/15/2002|IBMS,SL
Please advise.
Upvotes: 1
Views: 7498
Reputation: 1
sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' -e 's/\,/\|/g' btest.txt
Upvotes: 0
Reputation: 437953
Generally, using a bona fide CSV parser is the simplest and most robust choice when it comes to parsing CSV data.
For instance, python
's CSV parsing provides a straightforward solution:
$ python -c 'import csv,sys; reader=csv.reader(sys.stdin)
for row in reader:
print("|".join(row))' < test.csv
571283|1|R|01/15/2002|IBMS,SL
As a one-liner (in bash
, ksh
, or zsh
):
python -c $'import csv,sys; reader=csv.reader(sys.stdin)\nfor row in reader:\n print("|".join(row))' < test.csv
Upvotes: 0
Reputation: 2783
This works:
sed -e 's/,/|/' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' test.csv
Result is:
571283|1|R|01/15/2002|IBMS,SL
Your first sequence:
-e 's/ //g'
has to be changed in:
-e 's/,/|/'
Expanding to reply to your comment.
First of all you have to take care of the fact that sed is sequential, so the order of the transformations is important.
In your string:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800",,
You have same characters that you want to transform in different ways. The use of g for global and the sequence of transformations are therefore very important.
Let us build the sequence:
first of all let us get rid of ",, that we want to transform in ||:
sed -e 's/\"\,\,/||/' test.csv
will give us:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800||
Then we do the same with ," that we want to become |:
sed -e 's/\"\,\,/||/' -e 's/\,\"/|/' test.csv
gives:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015|330930929, 330931800||
Now we still have 2 commas we want to transform in | but not the third, the simply way is to repeat twice that transformation:
sed -e 's/\"\,\,/||/' -e 's/\,\"/|/' -e 's/,/|/' -e 's/,/|/' test.csv
Market Basket - WF Note A-2|RECM-PS Transfer|09/22/2015|330930929, 330931800||
That is!
Advise:
But finally I think you need to transform BOTH strings at the same time, so that:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800",,
571283,1,"R","01/15/2002","IBMS,SL"
becomes:
Market Basket - WF Note A-2|RECM-PS Transfer|09/22/2015|330930929, 330931800||
571283|1|R|01/15/2002|IBMS,SL
This is the sequence of transformations that does that:
sed -e 's/\",\"/|/' -e 's/\"\,\,/||/' -e 's/\,\"/|/' -e 's/,/|/' -e 's/,/|/' -e 's/\"//g' test.csv
Regards
Upvotes: 1
Reputation: 785186
Using gnu-awk
with FPAT
:
awk -v FPAT='"[^"]+"|[^,]+' -v OFS='|' '{for(i=1; i<=NF; i++) gsub(/"/, "", $i)} 1' file
571283|1|R|01/15/2002|IBMS,SL
In case gnu-awk
is unavailable use this perl
command:
perl -pe 's/(?=(([^"]*"){2})*[^"]*$),/|/g; s/"//g' file
571283|1|R|01/15/2002|IBMS,SL
Upvotes: 4