Reputation: 41
New to Pandas and python and have a question on replacing multiple unicode characters within an entire data frame. Using python 2.7 and importing from an excel sheet. My desire is to replace all non-ascii characters with their ascii equivalent or nothing.
examples:
u'SHOGUN JAPANESE \u2013 GRAND'
u'COMFORT INN & SUITES\xa0STONE MOUNTAIN'
This works, but is cumbersome:
rawdf = rawdf["Account_Name"].str.upper().str.replace(u'\u2013', ' ').str.replace(u'\xa0', '-') + "|" + rawdf["COID"].str.upper()
This did not work:
rawdf = rawdf.replace(u'\u2013', ' ')
Upvotes: 2
Views: 328
Reputation: 28683
You can do an encode/decode cycle like so:
rawdf["Account_Name"].str..encode('ascii', 'ignore').str.decode('ascii')
The use of 'ignore' makes characters that cannot be represented in ascii be dropped. The intermediate representation is bytes, so we need to encode it back to strings again.
Upvotes: 1