humanbeing
humanbeing

Reputation: 1697

trouble with df.replace in pandas

I have a set of about 30 columns that I want to convert to int. These columns get pulled in as string dtype but needs to be converted to int.

When I do this

df = df[['columns','to','change']].astype(int)

I get

ValueError: invalid literal for long() with base 10: '\xe2\x80\xa0'

So I want to replace that string with None and then drop NA. I am trying this.

#there are some funky values in some columns that should have numbers.  Lets trash them
starting_row_count = len(df.index)
df.dropna()
current_row_count = len(df.index)
print current_row_count - starting_row_count

def strip_it(value):
   #culls rotten values
    if type(value) == str:
        value.replace('\xe2\x80\xa0', None)
        value.replace('"', None)
        value.replace('=', None) 
    return value

df = df.astype(str)

df = df.apply(strip_it)

df.dropna()
current_row_count = len(df.index)
print current_row_count - starting_row_count

Both print statements produce 0, showing no rows are removed.

I would hope this would remove those values from my DF but I am not sure what I am doing wrong?

Upvotes: 2

Views: 509

Answers (2)

humanbeing
humanbeing

Reputation: 1697

I figured it out.

value.replace is incorrect.

.replace acts on a series or a dataframe, but not an individual value.

thanks all

Upvotes: 1

Anton Protopopov
Anton Protopopov

Reputation: 31662

You could do it with convert_objects:

df = df.convert_objects(convert_numeric=True)
df.dropna(inplace=True)

And it'll change all non-numeric values to NaN automatic

EDIT I've got a warning while I'm doing that way:

FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

So you could use apply and pd.to_numeric methods:

df = df.apply(pd.to_numeric, args=('coerce',))

Upvotes: 2

Related Questions