Yerko Antonio
Yerko Antonio

Reputation: 687

Remove \xe2\x80\xa6 from string python

I have a lot of txt files, and I need to replace some text on them. Almost all of them has this non-ascii character (I thought it was "...", but … is not the same) I've tried with replace() but I cannot make it, I need some help!! thanks in advance

Upvotes: 2

Views: 9897

Answers (3)

Vignesh
Vignesh

Reputation: 315

the problem is that these characters are not valid string, they are unicode.

import re
re.sub(r'<string to repleace>','',text,re.U)

most other answers will work too

Upvotes: -1

Matthew Adams
Matthew Adams

Reputation: 10146

Use unicode type strings. For example,

>>> print u'\xe2'.replace(u'\xe2','a')
a

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799062

If you use codecs.open() to open the files then you will get all strings as unicodes, which are much easier to handle.

Upvotes: 4

Related Questions