Reputation: 217
Input file:
"1","2col",""3col " "
"2","2col"," "3c,ol " "
"3","2col"," 3co,l"
"4","2col","3co,l"
"5","2col",""3co,l "" "
"6","2col",""3c,ol ""3c,ol"""
Output file:
"1","2col","3col "
"2","2col"," 3c,ol "
"3","2col"," 3co,l"
"4","2col","3co,l"
"5","2col","3co,l "
"6","2col","3c,ol 3c,ol"
Please help me to get the above Output using Unix command. Please note the 3rd column is modified in the output, all internal Double Quotes have been removed.
Comma is terminator. When comma is present between Double quote then it is not considered as terminator. See 6th line and after 2nd comma, comma is present as a text between Double quote which is fine.
What I have tried so far:
sed 's/""|/|/g'
sed -e "s/\"\"//g"
perl -pe 's/(?<!^)(?<!\,)"(?!\,)(?!$)/""/g'
Upvotes: 0
Views: 80
Reputation: 20022
Use the quotes for finding the first two fields and concatenate the other fields.
awk -F '"' '
BEGIN {q="\""}
{printf "%s", q$2q$3q$4q$5q; for (i=6;i<=NF;i++) printf "%s", $i; print q}
' inputfile
EDIT: An alternative
paste -d, <( cut -d"," -f1,2 < inputfile) \
<( cut -d"," -f3- < inputfile | sed 's/"//g;s/.*/"&"/')
EDIT: Another alternative
sed 's/old/new/g
: Apply the replacement to all matches to the regexp
sed
s/old/new/number`: Only replace the numberth match of the regexp.
When you mix the g and number modifiers in GNU sed, the first matces are ignored, and then match and replace all matches.
In this case:
sed -r 's/"//g6;s/$/"/' inputfile
Upvotes: 0
Reputation: 12448
Hypothesis (first and 2nd columns are "clean", they do not contain ,
for example)
Input:
"1","2col",""3col " "
"2","2col"," "3c,ol " "
"3","2col"," 3co,l"
"4","2col","3co,l"
"5","2col",""3co,l "" "
"6","2col",""3c,ol ""3c,ol"""
Command:
tr -d '"' < input | awk -F',' -v OFS=',' '{$1="\""$1"\"";$2="\""$2"\"";printf $1 OFS $2 OFS "\"";for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\"" RS}'
Output:
"1","2col","3col "
"2","2col"," 3c,ol "
"3","2col"," 3co,l "
"4","2col","3co,l"
"5","2col","3co,l "
"6","2col","3c,ol 3c,ol"
Explanations:
tr -d '"' < input
will remove all the "
| awk
pipe the output to awk
-F',' -v OFS=','
input/output field separator defined as comma "
by using $1="\""$1"\"";$2="\""$2"\"";
and you print them printf $1 OFS $2 OFS "\"";
for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\"" RS}
for the rest of the column you just append them back together and you add the last "
at the end of the line.For readability:
'{
$1="\""$1"\""
$2="\""$2"\""
printf $1 OFS $2 OFS "\""
for(u=3;u<=NF;u++)
{
if(u!=NF)printf $u OFS
else printf $u
}
printf "\"" RS
}'
Upvotes: 1