reverseScoopShoot
reverseScoopShoot

Reputation: 67

Python filtering for numeric and string in a single data frame column

I have a data frame where a single column Value has strings. Some of these string values are numeric. I realize that since they are in the same column these are both treated as strings, but I need to differentiate between those that are actually strings and those that are actually numeric.

I want to create two additional columns called Value_Num and Value_Txt then drop the original Value column.

I had a script running that used the following command:

df['Value_Num'] = df.Value.convert_objects(convert_numeric=true)

That worked perfectly. However, in transitioning to a newer version I get this error:

FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

I can't figure out how to rewrite my command with that pd.to_numeric function.

Thanks.

Upvotes: 2

Views: 4476

Answers (1)

unutbu
unutbu

Reputation: 879691

To avoid the FutureWarning issued by convert_objects, you could use pd.numeric with errors='coerce':

pd.to_numeric(df['Value'], errors='coerce')

This sets non-numeric strings to NaN -- exactly what we want for Value_Num.

We can then use pd.notnull to identify the rows in Value_Num with non-NaN values and set these rows to NaN in the Value column. Thus the Value column obtains the values we want for the Value_Txt column. Renaming Value to Value_Txt completes the task:


import numpy as np
import pandas as pd
df = pd.DataFrame({'Value':['1', 'foo', '1.23', 'bar', '3.14e6']})
df['Value_Num'] = pd.to_numeric(df['Value'], errors='coerce')
mask = pd.notnull(df['Value_Num'])
df.loc[mask, 'Value'] = np.nan
df = df.rename(columns={'Value':'Value_Txt'})
print(df)

yields

    Value   Value_Num   type
0       1        1.00  False
1    1.23        1.23  False
2     foo         NaN   True
3  3.14e6  3140000.00  False

Upvotes: 1

Related Questions