Thabra
Thabra

Reputation: 337

Pandas - Remove strings from a float number in a column

I have a dataframe like the following:

plan type  hour status     code
A    cont   0    ok       010.0
A    cont   2    ok      025GWA
A    cont   0    notok   010VVT
A    cont   0    other     6.05
A    vend   1    ok        6.01

The column code has a few string characters with different letters. In the end I would like to transform the 'code' column to float. I tried:

df['code'] = df['code'].str.extract('(\d+)').astype(float)

but with this I got:

plan type  hour status     code
A    cont   0    ok        10.0
A    cont   2    ok        25.0 
A    cont   0    notok     10.0
A    cont   0    other      6.0
A    vend   1    ok         6.0

How can I get a result like the following?

plan type  hour status     code
A    cont   0    ok       10.00
A    cont   2    ok       25.00
A    cont   0    notok    10.00
A    cont   0    other     6.05
A    vend   1    ok        6.01

Upvotes: 5

Views: 9583

Answers (2)

Zero
Zero

Reputation: 76917

Use (\d*\.?\d*)

In [441]: df['code'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
Out[441]:
0    10.00
1    25.00
2    10.00
3     6.05
4     6.01
Name: code, dtype: float64

Upvotes: 3

cs95
cs95

Reputation: 402493

Instead of than extraction, you could consider a substitution-based approach.

Use str.replace, and then convert to float with an astype/to_numeric conversion.

 df.code.str.replace('[^\d.]', '').astype(float)

Or,

pd.to_numeric(df.code.str.replace('[^\d.]', ''), errors='coerce')

0    10.00
1    25.00
2    10.00
3     6.05
4     6.01
Name: code, dtype: float64

Upvotes: 6

Related Questions