Reputation: 511
I have a string which looks like this:
my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''
I'd like to replace the double quotes (by simple quotes) that are in the middle of a word without changing the ones that in the beginning or in the end. Meaning that I'd like my_string
to be like this:
'''[u"column1" : u"abcd", u"column2" : u"te'st"]'''
Right now, I'm just using a workaround solution to do so. Basically, my solution replaces the double quotes that are in the middle of words if they are not preceded by the letter u. Here is what it looks like:
unusual=re.findall(r'([a-tv-zA-TV-Z0-9]\"[a-zA-Z0-9])', my_string)
if unusual:
for un in unusual:
my_string=my_string.replace(un, un.replace('"', "'"))
This works for me now, but it would be interesting to improve this solution because if I have a u
in the middle of the word next to a double quote, it will not work any more. For example: my_string='''[u"column1" : u"abcd", u"column2" : u"teu"st"]'''
Can I get some help with this guys ? I'm running out of ideas :)
PS: I'm using python 2.7
Upvotes: 2
Views: 1899
Reputation: 43169
You could try to use lookarounds (not 100% perfect):
(?<=\w)(?<![\[\s:]u)"(?=\w)
and replace these occurences with '
, see a demo on regex101.com.
(?<=\w) # require a word character immediately before
(?<![\[\s:]u) # no [u nor :u nor u (with spaces)
" # a double quote
(?=\w) # require a word character afterwards.
Python
:
import re
my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''
rx = re.compile(r'(?<=\w)(?<![\[\s:]u)"(?=\w)')
new_string = rx.sub("'", my_string)
print(new_string)
# [u"column1" : u"abcd", u"column2" : u"te'st"]
Better yet: fix the string where it came from.
Upvotes: 2
Reputation: 48711
Take care of string and the approach you choose to reach expected task. You may go with searching for:
"(?<![[ :]u.)(?=[a-zA-Z\d])
and replacing with '
.
If you consider _
as a word character above regex could be shorter:
"(?<![[ :]u.)(?=\w)
Breakdown:
"
Match a double quotation mark(?<![[ :]u.)
That's not preceded by delimiters :
, space or [
(?=\w)
And is followed by a word characterPython code:
re.sub(r'"(?<![[ :]u.)(?=\w)', "'", my_string)
Upvotes: 1
Reputation: 4213
>>> my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''
>>> print(re.sub(r'("\w+)(")(\w+")', r"\1'\3", my_string))
[u"column1" : u"abcd", u"column2" : u"te'st"]
Explanation:
("\w+)
will match any word starting with quote "
and parenthesis are used to represent groups
i.e. it will match "te
in your case (group 1)
(")
will match any existing quote after word
i.e. it will match "
after "te
in your case (group 2)
(\w+")
will match any word ending with quote "
i.e. it will match st"
in your case (group 3)
in re.sub()
we can directly represent group to keep from match
\1
will keep all the matched characters by ("\w+)
unchanged
\3
will keep all the matched characters by (\w+")
unchanged
\2
is representing the quote "
between both of matched group hence we can write any character(s) to replace group 2
Upvotes: 1