swet
swet

Reputation: 217

Unix - Remove internal double quote with terminator comma

Input file:

"1","2col",""3col " "
"2","2col"," "3c,ol     " "
"3","2col"," 3co,l"     
"4","2col","3co,l"
"5","2col",""3co,l      ""              "
"6","2col",""3c,ol ""3c,ol"""

Output file:

"1","2col","3col    "
"2","2col"," 3c,ol       "
"3","2col"," 3co,l"     
"4","2col","3co,l"
"5","2col","3co,l                       "
"6","2col","3c,ol 3c,ol"

Please help me to get the above Output using Unix command. Please note the 3rd column is modified in the output, all internal Double Quotes have been removed.

Comma is terminator. When comma is present between Double quote then it is not considered as terminator. See 6th line and after 2nd comma, comma is present as a text between Double quote which is fine.

What I have tried so far:

sed 's/""|/|/g'
sed -e "s/\"\"//g"
perl -pe 's/(?<!^)(?<!\,)"(?!\,)(?!$)/""/g'

Upvotes: 0

Views: 80

Answers (2)

Walter A
Walter A

Reputation: 20022

Use the quotes for finding the first two fields and concatenate the other fields.

awk -F '"' '
   BEGIN {q="\""}
   {printf "%s", q$2q$3q$4q$5q; for (i=6;i<=NF;i++) printf "%s", $i; print q}
   ' inputfile

EDIT: An alternative

paste -d, <( cut -d"," -f1,2 < inputfile) \
          <( cut -d"," -f3-  < inputfile | sed 's/"//g;s/.*/"&"/')

EDIT: Another alternative

sed 's/old/new/g: Apply the replacement to all matches to the regexp seds/old/new/number`: Only replace the numberth match of the regexp. When you mix the g and number modifiers in GNU sed, the first matces are ignored, and then match and replace all matches.
In this case:

sed -r 's/"//g6;s/$/"/' inputfile

Upvotes: 0

Allan
Allan

Reputation: 12448

Hypothesis (first and 2nd columns are "clean", they do not contain , for example)

Input:

"1","2col",""3col " "
"2","2col"," "3c,ol     " "
"3","2col"," 3co,l"     
"4","2col","3co,l"
"5","2col",""3co,l      ""              "
"6","2col",""3c,ol ""3c,ol"""

Command:

tr -d '"' < input | awk -F',' -v OFS=',' '{$1="\""$1"\"";$2="\""$2"\"";printf $1 OFS $2 OFS "\"";for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\"" RS}'

Output:

"1","2col","3col  "
"2","2col"," 3c,ol      "
"3","2col"," 3co,l     "
"4","2col","3co,l"
"5","2col","3co,l                    "
"6","2col","3c,ol 3c,ol"

Explanations:

  • tr -d '"' < input will remove all the "
  • | awk pipe the output to awk
  • -F',' -v OFS=',' input/output field separator defined as comma
  • you surround the first 2 columns with " by using $1="\""$1"\"";$2="\""$2"\""; and you print them printf $1 OFS $2 OFS "\"";
  • for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\"" RS} for the rest of the column you just append them back together and you add the last " at the end of the line.

For readability:

'{
  $1="\""$1"\""
  $2="\""$2"\""
  printf $1 OFS $2 OFS "\""
  for(u=3;u<=NF;u++)
  {
    if(u!=NF)printf $u OFS
    else printf $u
  }
  printf "\"" RS
}'

Upvotes: 1

Related Questions