Reputation: 3713
Given this data frame:
import pandas as pd
df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
})
df
ID value
0 a None
1 b NaN
2 c 6D
3 d 7
4 e 10D
5 f NONE
6 g x
7 h 10D aaa
8 i 1 D
9 j 10 D aa
10 k i7D
I'd like to extract numbers where present, else return 0, for any mess of situations as shown above.
The desired result is:
ID value
0 a 0
1 b 0
2 c 6
3 d 7
4 e 10
5 f 0
6 g 0
7 h 10
8 i 1
9 j 10
10 k 7
Thanks in advance!
Upvotes: 2
Views: 62
Reputation: 2015
Here is my approach of using re.findall
and apply
df['value'].apply(lambda x: 0 if not re.findall('\d+', str(x)) else re.findall('\d+', str(x))[0])
Upvotes: 1
Reputation: 473873
Alternatively, you can apply a function to the dataframe via applymap()
following the EAFP
principle catching multiple exceptions while extracting the digits:
def get_number(item):
try:
return int(re.search(r"\d+", str(item)).group(0))
except (AttributeError, ValueError, IndexError):
return 0
print(df.applymap(get_number))
Prints:
ID value
0 0 0
1 0 0
2 0 6
3 0 7
4 0 10
5 0 0
6 0 0
7 0 10
8 0 1
9 0 10
10 0 7
Upvotes: 1
Reputation: 8767
Try the following using Series.str.replace and fillna:
import pandas as pd
df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
})
df = df.fillna(0)
df = df.str.replace(r'\D+', '').astype(int)
Upvotes: 1