ABK
ABK

Reputation: 511

Using Regex to replace double quotes in the middle of a word

I have a string which looks like this:

my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''

I'd like to replace the double quotes (by simple quotes) that are in the middle of a word without changing the ones that in the beginning or in the end. Meaning that I'd like my_string to be like this:

'''[u"column1" : u"abcd", u"column2" : u"te'st"]'''

Right now, I'm just using a workaround solution to do so. Basically, my solution replaces the double quotes that are in the middle of words if they are not preceded by the letter u. Here is what it looks like:

unusual=re.findall(r'([a-tv-zA-TV-Z0-9]\"[a-zA-Z0-9])', my_string)
if unusual:
  for un in unusual:
    my_string=my_string.replace(un, un.replace('"', "'"))

This works for me now, but it would be interesting to improve this solution because if I have a u in the middle of the word next to a double quote, it will not work any more. For example: my_string='''[u"column1" : u"abcd", u"column2" : u"teu"st"]'''

Can I get some help with this guys ? I'm running out of ideas :)

PS: I'm using python 2.7

Upvotes: 2

Views: 1899

Answers (3)

Jan
Jan

Reputation: 43169

You could try to use lookarounds (not 100% perfect):

(?<=\w)(?<![\[\s:]u)"(?=\w)

and replace these occurences with ', see a demo on regex101.com.


Broken down, this says:

(?<=\w)       # require a word character immediately before
(?<![\[\s:]u) # no [u nor :u nor  u (with spaces)
"             # a double quote
(?=\w)        # require a word character afterwards.


In Python:

import re

my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''
rx = re.compile(r'(?<=\w)(?<![\[\s:]u)"(?=\w)')

new_string = rx.sub("'", my_string)
print(new_string)
# [u"column1" : u"abcd", u"column2" : u"te'st"]

Better yet: fix the string where it came from.

Upvotes: 2

revo
revo

Reputation: 48711

Take care of string and the approach you choose to reach expected task. You may go with searching for:

"(?<![[ :]u.)(?=[a-zA-Z\d])

and replacing with '.

Live demo

If you consider _ as a word character above regex could be shorter:

"(?<![[ :]u.)(?=\w)

Breakdown:

  • " Match a double quotation mark
  • (?<![[ :]u.) That's not preceded by delimiters :, space or [
  • (?=\w) And is followed by a word character

Python code:

re.sub(r'"(?<![[ :]u.)(?=\w)', "'", my_string)

Upvotes: 1

Gahan
Gahan

Reputation: 4213

>>> my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''

>>> print(re.sub(r'("\w+)(")(\w+")', r"\1'\3", my_string))
[u"column1" : u"abcd", u"column2" : u"te'st"]

Explanation:

("\w+) will match any word starting with quote " and parenthesis are used to represent groups i.e. it will match "te in your case (group 1)

(") will match any existing quote after word i.e. it will match " after "te in your case (group 2)

(\w+") will match any word ending with quote " i.e. it will match st" in your case (group 3)

in re.sub() we can directly represent group to keep from match

\1 will keep all the matched characters by ("\w+) unchanged

\3 will keep all the matched characters by (\w+") unchanged

\2 is representing the quote " between both of matched group hence we can write any character(s) to replace group 2

Upvotes: 1

Related Questions