Pamela Kelly
Pamela Kelly

Reputation: 54

python replace and sub not working with unicode character u"\u0092"

Python Version: Python 3.6. I am trying to replace the Unicode character u"\u0092" (aka curly apostrophe) with a regular apostrophe.

I have tried all of the below:

    mystring = <some string with problem character>
    # option 1 
    mystring = mystring.replace(u"\u0092", u\"0027")
    # option 2 
    mystring = mystring.replace(u"\u0092", "'")
    # option 3
    mystring = re.sub('\u0092',u"\u0027", mystring)
    # option 4
    mystring = re.sub('\u0092',u"'", mystring)

None of the above updates the character in mystring. Other sub and replace operations are working - which makes me think it is either an issue with how I am using the Unicode characters, or an issue with this particular character.

Update: I have also tried the suggestion below neither of which work:

    mystring.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8")
    mystring.decode("utf-8").replace(u"\u2019", u"\u0027").encode("utf-8")

But it gives me the error: AttributeError: 'str' object has no attribute 'decode'

Just to Clarify: The IDE is not the core issue here. My question is why when I run replace or sub with a Unicode character and print the result does it not register - the character is still present in the string.

Upvotes: 0

Views: 1958

Answers (1)

bobrobbob
bobrobbob

Reputation: 1281

your code is wrong it's \u2019 for apostrophe (’). from wikipedia

U+0092 146 Private Use 2 PU2

that's why eclipse is not happy.


with the right code:

#_*_ coding: utf8 _*_
import re
string = u"dkfljglkdfjg’fgkljlf"
string = string.replace(u"’", u"'"))
string = string.replace(u"\u2019", u"\u0027")
string = re.sub(u'\u2019',u"\u0027", string)
string = re.sub(u'’',u"'", string)

all solutions work

and don't call your vars str

Upvotes: 1

Related Questions