Reputation: 1697
I have a set of about 30 columns that I want to convert to int. These columns get pulled in as string dtype but needs to be converted to int.
When I do this
df = df[['columns','to','change']].astype(int)
I get
ValueError: invalid literal for long() with base 10: '\xe2\x80\xa0'
So I want to replace that string with None and then drop NA. I am trying this.
#there are some funky values in some columns that should have numbers. Lets trash them
starting_row_count = len(df.index)
df.dropna()
current_row_count = len(df.index)
print current_row_count - starting_row_count
def strip_it(value):
#culls rotten values
if type(value) == str:
value.replace('\xe2\x80\xa0', None)
value.replace('"', None)
value.replace('=', None)
return value
df = df.astype(str)
df = df.apply(strip_it)
df.dropna()
current_row_count = len(df.index)
print current_row_count - starting_row_count
Both print statements produce 0, showing no rows are removed.
I would hope this would remove those values from my DF but I am not sure what I am doing wrong?
Upvotes: 2
Views: 509
Reputation: 1697
I figured it out.
value.replace is incorrect.
.replace acts on a series or a dataframe, but not an individual value.
thanks all
Upvotes: 1
Reputation: 31662
You could do it with convert_objects
:
df = df.convert_objects(convert_numeric=True)
df.dropna(inplace=True)
And it'll change all non-numeric values to NaN automatic
EDIT I've got a warning while I'm doing that way:
FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
So you could use apply and pd.to_numeric methods:
df = df.apply(pd.to_numeric, args=('coerce',))
Upvotes: 2