reverseScoopShoot
reverseScoopShoot

Reputation: 67

Pandas Dataframe column with both Strings and Floats

I have a dataframe where one of the columns holds strings and floats.

The column is named 'Value' has values like "AAA", "Korea, Republic of", "123,456.78" and "5000.00".

The first two values are obviously strings, and the last is obviously a float. The third value should be a float as well, but due to the commas, the next step of my code sees it as a string.

Is there an easy way for me to remove the commas for those values that are really floats but keep them for values that are really strings? So "Korea, Republic of" stays, but "123,456,78" converts to "123456.78".

Thanks.

Upvotes: 3

Views: 4531

Answers (1)

Ami Tavory
Ami Tavory

Reputation: 76297

To begin with, your Pandas column does not contain strings and floats, since columns contain homogeneous types. If one entry is a string, then all of them are. You can verify this by doing something like (assuming the DataFrame is df and the column is c):

>>> df.dtypes

and noticing that the type should be something like Object.

Having said that, you can convert the string column to a different string column, where the strings representing numbers, have the commas removed. This might be useful for further operations, e.g., when you wish to see which entries can be converted to floats. This can be done as follows.

First, write a function like:

import re

def remove_commas_from_numbers(n):
    r = re.compile(r'^(\d+(?:,\d+)?.+)*$')
    m = r.match(n)
    if not m:
        return n
    return n.replace(',', '')

remove_commas_from_numbers('1,1.')

Then, you can do something like:

>>> df.c = df.c.apply(remove_commas_from_numbers)

Again, it's important to note that df.c's type will be string.

Upvotes: 4

Related Questions