Reputation: 54
Python Version: Python 3.6. I am trying to replace the Unicode character u"\u0092" (aka curly apostrophe) with a regular apostrophe.
I have tried all of the below:
mystring = <some string with problem character>
# option 1
mystring = mystring.replace(u"\u0092", u\"0027")
# option 2
mystring = mystring.replace(u"\u0092", "'")
# option 3
mystring = re.sub('\u0092',u"\u0027", mystring)
# option 4
mystring = re.sub('\u0092',u"'", mystring)
None of the above updates the character in mystring. Other sub and replace operations are working - which makes me think it is either an issue with how I am using the Unicode characters, or an issue with this particular character.
Update: I have also tried the suggestion below neither of which work:
mystring.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8")
mystring.decode("utf-8").replace(u"\u2019", u"\u0027").encode("utf-8")
But it gives me the error: AttributeError: 'str' object has no attribute 'decode'
Just to Clarify: The IDE is not the core issue here. My question is why when I run replace or sub with a Unicode character and print the result does it not register - the character is still present in the string.
Upvotes: 0
Views: 1958
Reputation: 1281
your code is wrong it's \u2019
for apostrophe (’). from wikipedia
U+0092 146 Private Use 2 PU2
that's why eclipse is not happy.
with the right code:
#_*_ coding: utf8 _*_
import re
string = u"dkfljglkdfjg’fgkljlf"
string = string.replace(u"’", u"'"))
string = string.replace(u"\u2019", u"\u0027")
string = re.sub(u'\u2019',u"\u0027", string)
string = re.sub(u'’',u"'", string)
all solutions work
and don't call your vars str
Upvotes: 1