Reputation: 2724
There seem to be a lot of posts about doing this in other languages, but I can't seem to figure out how in Python (I'm using 2.7).
To be clear, I would ideally like to keep the string in unicode, just be able to replace certain specific characters.
For instance:
thisToken = u'tandh\u2013bm'
print(thisToken)
prints the word with the m-dash in the middle. I would just like to delete the m-dash. (but not using indexing, because I want to be able to do this anywhere I find these specific characters.)
I try using replace
like you would with any other character:
newToke = thisToken.replace('\u2013','')
print(newToke)
but it just doesn't work. Any help is much appreciated. Seth
Upvotes: 4
Views: 10626
Reputation: 131
You can see the answer in this post: How to replace unicode characters in string with something else python?
Decode the string to Unicode. Assuming it's UTF-8-encoded:
str.decode("utf-8")
Call the replace method and be sure to pass it a Unicode string as its first argument:
str.decode("utf-8").replace(u"\u2022", "")
Encode back to UTF-8, if needed:
str.decode("utf-8").replace(u"\u2022", "").encode("utf-8")
Upvotes: 0
Reputation: 76184
The string you're searching for to replace must also be a Unicode string. Try:
newToke = thisToken.replace(u'\u2013','')
Upvotes: 9