seth127
seth127

Reputation: 2724

replace or delete specific unicode characters in python

There seem to be a lot of posts about doing this in other languages, but I can't seem to figure out how in Python (I'm using 2.7).

To be clear, I would ideally like to keep the string in unicode, just be able to replace certain specific characters.

For instance:

thisToken = u'tandh\u2013bm'
print(thisToken)

prints the word with the m-dash in the middle. I would just like to delete the m-dash. (but not using indexing, because I want to be able to do this anywhere I find these specific characters.)

I try using replace like you would with any other character:

newToke = thisToken.replace('\u2013','')
print(newToke)

but it just doesn't work. Any help is much appreciated. Seth

Upvotes: 4

Views: 10626

Answers (2)

megavexus
megavexus

Reputation: 131

You can see the answer in this post: How to replace unicode characters in string with something else python?

Decode the string to Unicode. Assuming it's UTF-8-encoded:

str.decode("utf-8")

Call the replace method and be sure to pass it a Unicode string as its first argument:

str.decode("utf-8").replace(u"\u2022", "")

Encode back to UTF-8, if needed:

str.decode("utf-8").replace(u"\u2022", "").encode("utf-8")

Upvotes: 0

Kevin
Kevin

Reputation: 76184

The string you're searching for to replace must also be a Unicode string. Try:

newToke = thisToken.replace(u'\u2013','')

Upvotes: 9

Related Questions