Reputation: 67
I have a dataframe where one of the columns holds strings and floats.
The column is named 'Value' has values like "AAA", "Korea, Republic of", "123,456.78" and "5000.00".
The first two values are obviously strings, and the last is obviously a float. The third value should be a float as well, but due to the commas, the next step of my code sees it as a string.
Is there an easy way for me to remove the commas for those values that are really floats but keep them for values that are really strings? So "Korea, Republic of" stays, but "123,456,78" converts to "123456.78".
Thanks.
Upvotes: 3
Views: 4531
Reputation: 76297
To begin with, your Pandas column does not contain strings and floats, since columns contain homogeneous types. If one entry is a string, then all of them are. You can verify this by doing something like (assuming the DataFrame is df
and the column is c
):
>>> df.dtypes
and noticing that the type should be something like Object
.
Having said that, you can convert the string column to a different string column, where the strings representing numbers, have the commas removed. This might be useful for further operations, e.g., when you wish to see which entries can be converted to float
s. This can be done as follows.
First, write a function like:
import re
def remove_commas_from_numbers(n):
r = re.compile(r'^(\d+(?:,\d+)?.+)*$')
m = r.match(n)
if not m:
return n
return n.replace(',', '')
remove_commas_from_numbers('1,1.')
Then, you can do something like:
>>> df.c = df.c.apply(remove_commas_from_numbers)
Again, it's important to note that df.c
's type will be string.
Upvotes: 4