menteith
menteith

Reputation: 678

Replace double quotes that come in pairs

I'd like to replace double quotes " characters which come in pairs. Let me explain what I mean.

"Some sentence"

Here double quotes should be replaced because they come in pair.

"Some sentence

Here should not be replaced - there is no matching pair for the first quote character.

I'd like to replace first quote character with .

❯ echo „ |hexdump -C
00000000  e2 80 9e 0a

And the second quote character with

❯ echo ” |hexdump -C
00000000  e2 80 9d 0a

Summing it up, the following:

Hi, "how
are you"

Should be the following after being replacement is made.

Hi, „how
are you”

I've come up with the following code, but it fails to work: 'sed -r s/(\")(.+)(\")/\1\xe2\x80\x9e\3\xe2\x80\x9d/g'

" hi " gives "„"”.

EDIT As requested in the comments, here comes a sample from a file to be modified. Important note: the file is structured - perhaps it may help. The file is always a srt file, i.e. movie subtitle format.

104
00:10:25,332 --> 00:10:27,876
Kobieta mówi do drugiej:
"Widzisz to, co ja?"

105
00:10:28,001 --> 00:10:30,904
A tamta: "No to co?
Każdy wygląda tak samo."

Upvotes: 0

Views: 98

Answers (2)

TrentP
TrentP

Reputation: 4722

Your expression doesn't work because you have three capturing groups: The three sets of (). You are putting the 1st (the first quote) and the 3rd (the last quote) in the output and ignoring the 2nd, which is the part you want to keep.

There's no reason to capture the quotes, since you don't want to inject them into the output. Only the bit in the middle needs to be captured.

There is also a flaw, the (.*) will itself match against a string containing a quote. So /"(.*)"/ would match the entire sequence "one"two", with the capture, (.*), matching one"two. Use [^"]* to match a sequence of non-quote characters.

Fixing this, and treating the entire text file as one line with -z, which only works if there are no nul characters in the text file, it appears this works:

sed -zE 's/"([^"]+)"/„\1“/g'

Upvotes: 1

Renaud Pacalet
Renaud Pacalet

Reputation: 29290

sed -rn ':a;s/"([^"]*)"/„\1”/g;/"/!{p;b;};$p;N;ba'

It substitutes all "xx" with „xx”. If the result contains no more " it is printed and we restart with next line. Else we concatenate the next line and we restart. The $p is just here to print the last lines if they contain a dangling ".

Upvotes: 0

Related Questions