gtomer
gtomer

Reputation: 6564

Leave in column only digits and decimal point

I have a column with a string containing letter with a number. The number is sometimes with and sometimes without a decimal point. I want to convert the number to float. example dataframe:

df = pd.DataFrame({'colA': ['q7.8', 'g5.3', '4.5r', 'john7']})

The updated column should contain: 7.8, 5.3, 4.5 7.0.

There are no systemic rules for the number of letters and their location.

Thanks

Upvotes: 1

Views: 269

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Assuming only one number is in each cell we can use str.extract then astype to convert to float:

df['colA'] = df['colA'].str.extract(r'(\d+(?:\.\d+)?)').astype('float')

There are many regex available at How to extract a floating number from a string if additional considerations like exponentionation or positive/negative are needed like:

df['colA'] = df['colA'].str.extract(
    r'([-+]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+)?)'
).astype('float')

*Note with this approach extract needs exactly 1 capturing group.

df:

   colA
0   7.8
1   5.3
2   4.5
3   7.0

Upvotes: 3

Related Questions