Sergei
Sergei

Reputation: 173

Better way to convert pandas dataframe columns to numeric

I have a dataframe with some columns containing data of type object because of some funky data entries (aka a . or whatnot).

I have been able to correct this by identifying the object columns and then doing this:

obj_cols = df.loc[:, df.dtypes == object]
conv_cols = obj_cols.convert_objects(convert_numeric='force')

This works fine and allows me to run the regression I need, but generates this error:

FutureWarning: convert_objects is deprecated.

Is there a better way to do this so as to avoid the error? I also tried constructing a lambda function but that didn't work.

Upvotes: 7

Views: 21455

Answers (2)

MissBleu
MissBleu

Reputation: 175

If you have a sample data frame:

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 'f', 'Mar': 140},
     {'account': 'Alpha Co',  'Jan': 'e', 'Feb': 210, 'Mar': 215},
     {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 'g' }]
df = pd.DataFrame(sales)

and you want to get rid of the strings in the columns that should be numeric, you can do this with pd.to_numeric

cols = ['Jan', 'Feb', 'Mar']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

your new data frame will have NaN in place of the 'wacky' data

Upvotes: 3

Vaishali
Vaishali

Reputation: 38425

Convert_objects is deprecated. Use this instead. You can add parameter errors='coerce' to convert bad non numeric values to NaN.

conv_cols = obj_cols.apply(pd.to_numeric, errors = 'coerce')

The function will be applied to the whole DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

Upvotes: 14

Related Questions