Reputation: 687
I have a lot of txt files, and I need to replace some text on them. Almost all of them has this non-ascii
character (I thought it was "..."
, but … is not the same)
I've tried with replace()
but I cannot make it, I need some help!! thanks in advance
Upvotes: 2
Views: 9897
Reputation: 315
the problem is that these characters are not valid str
ing, they are unicode
.
import re
re.sub(r'<string to repleace>','',text,re.U)
most other answers will work too
Upvotes: -1
Reputation: 10146
Use unicode type strings. For example,
>>> print u'\xe2'.replace(u'\xe2','a')
a
Upvotes: 2
Reputation: 799062
If you use codecs.open()
to open the files then you will get all strings as unicode
s, which are much easier to handle.
Upvotes: 4