Reputation: 879
I have a dataframe (df):
df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
I can find the numbers in it:
df['B'] = df.A.replace(regex={'[^\w]':'','^\D+':'','\D+':' '}).str.split('\s')
A B
0 54321 NaN
1 it is 54322 [54322]
2 is it 54323 or 4? [54323, 4]
3 NaN NaN
But when I try to find the highest number for each row:
df['C'] = df['B'].apply(lambda x : max(x))
I get:
TypeError: 'float' object is not iterable
Upvotes: 0
Views: 116
Reputation: 862831
Use lambda function with if-else
, also added converting to intgers for correct max
:
f = lambda x : max(int(y) for y in x) if isinstance(x, list) else np.nan
df['C'] = df['B'].apply(f)
print (df)
A B C
0 54321 NaN NaN
1 it is 54322 [54322] 54322.0
2 is it 54323 or 4? [54323, 4] 54323.0
3 NaN NaN NaN
Or use Series.str.extractall
for MultiIndex
with convert to int
and using max
per first level:
df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
df['C'] = df.A.astype(str).str.extractall('(\d+)').astype(int).max(level=0)
print (df)
A C
0 54321 54321.0
1 it is 54322 54322.0
2 is it 54323 or 4? 54323.0
3 NaN NaN
Upvotes: 1
Reputation: 195468
Another solution:
import re
df['B'] = df['A'].apply(lambda x: pd.Series(re.findall(r'\d+', str(x))).astype(float).max())
print(df)
Prints:
A B
0 54321 54321.0
1 it is 54322 54322.0
2 is it 54323 or 4? 54323.0
3 NaN NaN
Upvotes: 1