Reputation: 1255
I am trying to extract only numeric values from all the columns in a list, whether it is on the right, left or middle of any characters.
I have a dataframe that looks like the below:
df = pd.DataFrame({
'A': ['1', 3, "1", "cad -2", 3, 4.876, np.nan],
'B': ['116', 'CAD -2.6399', 'CAD -3', '4 $', '$5%', 'A', '-1.2 2']
})
df
I tried the below code but it is removing - from column "A" row 4 and column "B" row 3
l = ["A", "B"]
for columns in l:
if isinstance(df[columns], object):
df[columns] = df[columns].astype('str').str.extract("([-+]?\d*\.\d+|\d+)").astype(float)
df
I want my final dataframe to look like below:
A B
1 116
3 -2.6399
1 -3
-2 4
3 5
4.876 NaN
NaN -1.2
Upvotes: 1
Views: 4566
Reputation: 18916
Try to use str.extract() and a regex to find float, something like this:
df['B'] = df['B'].str.extract("([-+]?\d*\.\d+|[-+]?\d+)").astype(float)
Note: 0 matches will return np.nan and 2+ matches will return the first match!
Full example:
import pandas as pd
df = pd.DataFrame({
'A': [1, 3, 1, 2, 3, 4, 2],
'B': ['116', 'CAD 2.6399', 'CAD 3', '4 $', '$5%', 'A', '1.2 2']
})
df['B'] = df['B'].str.extract("([-+]?\d*\.\d+|[-+]?\d+)").astype(float)
print(df)
Returns:
A B
0 1 116.0000
1 3 2.6399
2 1 3.0000
3 2 4.0000
4 3 5.0000
5 4 NaN
6 2 1.2000
UPDATE: you can use this for multiple object columns:
for column in [i for i in df.columns if df[i].dtype == 'object']:
df[column] = df[column].astype(str).str.extract("([-+]?\d*\.\d+|[-+]?\d+)").astype(float)
Based on regex found in this answer: How to extract a floating number from a string
Upvotes: 2
Reputation: 1
Try this:
def get_first_nbr_from_str(input_str):
'''
:param input_str: strings that contains digit and words
:return: the number extracted from the input_str
demo:
'ab324.23.123xyz': 324.23
'.5abc44': 0.5
'''
if not input_str and not isinstance(input_str, str):
return 0
out_number = ''
for ele in input_str:
if (ele == '.' and '.' not in out_number) or ele.isdigit():
out_number += ele
elif out_number:
break
return float(out_number)
And then:
df['B'].apply(get_first_nbr_from_str)
Upvotes: 0