Reputation: 443
I'm working on with some identification numbers that are pulled from an excel spreadsheet. They are a mixture of numbers and strings. (manually maintained excel sheet) I loaded this into pandas and noticed some spaces in the data. I ran a filter on my df, but only got 2 rows back. So, I went in and ran this code:
for i in exceldata['CL ID #']:
print(repr(str(i)))
I got some odd results back. It seems the spaces aren't spaces at all, but some sort of weird, non ascii character. Here are a few values and how they look printed in different ways.
for i in exceldata['ID']:
print(repr(str(i)))
'502705'
'527167\xa0'
'575405\xa0'
...And those same values printed normally
exceldata['ID']
502705
502717
537545
I've dug around and found solutions on how to change the values. But, here's where my request differs: I need to clean up these odd values in a pandas dataframe. I couldn't figure out how to do it in set logic. It's a very easy fix, but you can't use loops on Pandas df's since you're only editing a copy, not the actual df itself. Any help/direction is appreciated.
EDIT: This is not a duplicate. I need to be able to iterate over a pandas dataframe, and that is the part that is tripping me up.
Upvotes: 0
Views: 1861
Reputation: 4558
Given the column contains a mixture of numbers and strings, you might be better off converting everything to a string and using pandas string manipulation methods on the column. For instance, to replace \xa0
with an empty string in the column:
exceldata['ID'] = exceldata['ID'].astype(str).str.replace(u'\xa0', '')
# ^^^^^^^^^^^^ potentially unnecessary, depending on the format of your data
This will replace the ID
column with the string representation of the values, but with the \xa0
characters removed.
Upvotes: 3