Angus Gray
Angus Gray

Reputation: 443

replacing a non ascii space character in python (pandas)

I'm working on with some identification numbers that are pulled from an excel spreadsheet. They are a mixture of numbers and strings. (manually maintained excel sheet) I loaded this into pandas and noticed some spaces in the data. I ran a filter on my df, but only got 2 rows back. So, I went in and ran this code:

for i in exceldata['CL ID #']:
    print(repr(str(i)))

I got some odd results back. It seems the spaces aren't spaces at all, but some sort of weird, non ascii character. Here are a few values and how they look printed in different ways.

for i in exceldata['ID']:
    print(repr(str(i)))

'502705'

'527167\xa0'

'575405\xa0'

...And those same values printed normally

exceldata['ID']

   502705
   502717 
   537545 

I've dug around and found solutions on how to change the values. But, here's where my request differs: I need to clean up these odd values in a pandas dataframe. I couldn't figure out how to do it in set logic. It's a very easy fix, but you can't use loops on Pandas df's since you're only editing a copy, not the actual df itself. Any help/direction is appreciated.

EDIT: This is not a duplicate. I need to be able to iterate over a pandas dataframe, and that is the part that is tripping me up.

Upvotes: 0

Views: 1861

Answers (1)

PaSTE
PaSTE

Reputation: 4558

Given the column contains a mixture of numbers and strings, you might be better off converting everything to a string and using pandas string manipulation methods on the column. For instance, to replace \xa0 with an empty string in the column:

exceldata['ID'] = exceldata['ID'].astype(str).str.replace(u'\xa0', '')
#                                ^^^^^^^^^^^^ potentially unnecessary, depending on the format of your data

This will replace the ID column with the string representation of the values, but with the \xa0 characters removed.

Upvotes: 3

Related Questions