Himanshu Ahuja
Himanshu Ahuja

Reputation: 37

Special Unicode Characters are not removed in Python 3

I have a keys list including words. When I make this command:

for key in keys:
  print(key)

I get normal output in terminal.

enter image description here

but when I print the entire list using print(keys), I get this output:

enter image description here

I have tried using key.replace("\u202c", ''), key.replace("\\u202c", ''), re.sub(u'\u202c', '', key) but none solved the problem. I also tried the solutions here, but none of them worked either:

Replacing a unicode character in a string in Python 3

Removing unicode \u2026 like characters in a string in python2.7

Python removing extra special unicode characters

How can I remove non-ASCII characters but leave periods and spaces using Python?

I scraped this from Google Trends using Beautiful Soup and retrieved text from get_text() Also in the page source of Google Trends Page, the words are listed as follows:

enter image description here When I pasted the text here directly from the page source, the text pasted without these unusual symbols.‬‬

Upvotes: 1

Views: 3032

Answers (1)

riteshtch
riteshtch

Reputation: 8769

You can just strip out the characters using strip.

>>> keys=['\u202cABCD', '\u202cXYZ\u202c']
>>> for key in keys:
...     print(key)
... 
ABCD
XYZ‬
>>> newkeys=[key.strip('\u202c') for key in keys]
>>> print(keys)
['\u202cABCD', '\u202cXYZ\u202c']
>>> print(newkeys)
['ABCD', 'XYZ']
>>> 

Tried 1 of your methods, it does work for me:

>>> keys
['\u202cABCD', '\u202cXYZ\u202c']
>>> newkeys=[]
>>> for key in keys:
...     newkeys += [key.replace('\u202c', '')]
... 
>>> newkeys
['ABCD', 'XYZ']
>>> 

Upvotes: 2

Related Questions