Reputation: 67
I have a data frame where a single column Value
has strings. Some of these string values are numeric. I realize that since they are in the same column these are both treated as strings, but I need to differentiate between those that are actually strings and those that are actually numeric.
I want to create two additional columns called Value_Num
and Value_Txt
then drop the original Value
column.
I had a script running that used the following command:
df['Value_Num'] = df.Value.convert_objects(convert_numeric=true)
That worked perfectly. However, in transitioning to a newer version I get this error:
FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
I can't figure out how to rewrite my command with that pd.to_numeric
function.
Thanks.
Upvotes: 2
Views: 4476
Reputation: 879691
To avoid the FutureWarning
issued by convert_objects
, you could use pd.numeric
with errors='coerce'
:
pd.to_numeric(df['Value'], errors='coerce')
This sets non-numeric strings to NaN
-- exactly what we want for Value_Num
.
We can then use pd.notnull
to identify the rows in Value_Num
with non-NaN values and set these rows to NaN
in the Value
column. Thus the Value
column obtains the values we want for the Value_Txt
column. Renaming Value
to Value_Txt
completes the task:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Value':['1', 'foo', '1.23', 'bar', '3.14e6']})
df['Value_Num'] = pd.to_numeric(df['Value'], errors='coerce')
mask = pd.notnull(df['Value_Num'])
df.loc[mask, 'Value'] = np.nan
df = df.rename(columns={'Value':'Value_Txt'})
print(df)
yields
Value Value_Num type
0 1 1.00 False
1 1.23 1.23 False
2 foo NaN True
3 3.14e6 3140000.00 False
Upvotes: 1