Sean
Sean

Reputation: 41

Pandas dataframe replace

New to Pandas and python and have a question on replacing multiple unicode characters within an entire data frame. Using python 2.7 and importing from an excel sheet. My desire is to replace all non-ascii characters with their ascii equivalent or nothing.

examples:
u'SHOGUN JAPANESE \u2013 GRAND'
u'COMFORT INN & SUITES\xa0STONE MOUNTAIN'

This works, but is cumbersome:

rawdf = rawdf["Account_Name"].str.upper().str.replace(u'\u2013', ' ').str.replace(u'\xa0', '-') + "|" + rawdf["COID"].str.upper()

This did not work:

rawdf = rawdf.replace(u'\u2013', ' ')

Upvotes: 2

Views: 328

Answers (1)

mdurant
mdurant

Reputation: 28683

You can do an encode/decode cycle like so:

rawdf["Account_Name"].str..encode('ascii', 'ignore').str.decode('ascii')

The use of 'ignore' makes characters that cannot be represented in ascii be dropped. The intermediate representation is bytes, so we need to encode it back to strings again.

Upvotes: 1

Related Questions