Karsten
Karsten

Reputation: 61

replacing doublequotes in csv

I've got nearly the following problem and didn't find the solution. This could be my CSV file structure:

1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B

As you can see there are some " written for inch and "L" in the enclosing ".

Now I'm looking for a UNIX shell script to replace the " (inch) and "L" double quotes with 2 single quotes, like the following example:

sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $QFILE

Can anyone help me?

Upvotes: 6

Views: 1007

Answers (3)

bmk
bmk

Reputation: 14137

Maybe this is what you want:

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"

I.e.: Find double quotes (") following a number ([0-9]) but not followed by a semicolon ([^;]) and replace it with two single quotes.

Edit: I can extend my command (it's becoming quite long now):

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"

As you are using SunOS I guess you cannot use extended regular expressions (sed -r)? Therefore I did it that way: The first s command replaces all inch " with '', the second and the third s are the same. They substitute all " that are not a direct neighbor of a ; with a single '. I have to do it twice to be able to substitute the second " of e.g. "L" because there's only one character between both " and this character is already matched by \([^;]\). This way you would also substitute "" with ''. If you have """ or """" etc. you have to put one more (but only one more) s.

Upvotes: 3

anubhava
anubhava

Reputation: 785196

Update (Using perl it easy since you get full lookahead features)

perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file

Output

1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B

Using sed, grep only

Just by using grep, sed (and not perl, php, python etc) a not so elegant solution can be:

grep -o '[^;]*' file | sed  's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g' 

Output - for your input file it gives:

1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
  • grep -o is basically splitting the input by ;
  • sed first replaces " at start of line by `
  • then it replaces " at end of line by another `
  • it then replaces all remaining double quotes " by single quite '
  • finally it puts back all " at the start and end

Upvotes: 3

Jav_Rock
Jav_Rock

Reputation: 22245

For the "L" try this:

 sed "s/\"L\"/'L'/g"

For inches you can try:

sed "s/\([0-9]\)\"\"/\1''\"/g" 

I am not sure it is the best option, but I have tried and it works. I hope this is helpful.

Upvotes: 2

Related Questions