Reputation: 337
I have a dataframe like the following:
plan type hour status code
A cont 0 ok 010.0
A cont 2 ok 025GWA
A cont 0 notok 010VVT
A cont 0 other 6.05
A vend 1 ok 6.01
The column code has a few string characters with different letters. In the end I would like to transform the 'code' column to float. I tried:
df['code'] = df['code'].str.extract('(\d+)').astype(float)
but with this I got:
plan type hour status code
A cont 0 ok 10.0
A cont 2 ok 25.0
A cont 0 notok 10.0
A cont 0 other 6.0
A vend 1 ok 6.0
How can I get a result like the following?
plan type hour status code
A cont 0 ok 10.00
A cont 2 ok 25.00
A cont 0 notok 10.00
A cont 0 other 6.05
A vend 1 ok 6.01
Upvotes: 5
Views: 9583
Reputation: 76917
Use (\d*\.?\d*)
In [441]: df['code'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
Out[441]:
0 10.00
1 25.00
2 10.00
3 6.05
4 6.01
Name: code, dtype: float64
Upvotes: 3
Reputation: 402493
Instead of than extraction, you could consider a substitution-based approach.
Use str.replace
, and then convert to float with an astype
/to_numeric
conversion.
df.code.str.replace('[^\d.]', '').astype(float)
Or,
pd.to_numeric(df.code.str.replace('[^\d.]', ''), errors='coerce')
0 10.00
1 25.00
2 10.00
3 6.05
4 6.01
Name: code, dtype: float64
Upvotes: 6