R. Cox
R. Cox

Reputation: 879

Find max in Pandas dataframe column of lists

I have a dataframe (df):

df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})

I can find the numbers in it:

df['B'] = df.A.replace(regex={'[^\w]':'','^\D+':'','\D+':' '}).str.split('\s')

                   A           B
0              54321         NaN
1        it is 54322     [54322]
2  is it 54323 or 4?  [54323, 4]
3                NaN         NaN

But when I try to find the highest number for each row:

df['C'] = df['B'].apply(lambda x : max(x))

I get:

TypeError: 'float' object is not iterable

Upvotes: 0

Views: 116

Answers (2)

jezrael
jezrael

Reputation: 862831

Use lambda function with if-else, also added converting to intgers for correct max:

f = lambda x : max(int(y) for y in x) if isinstance(x, list) else np.nan
df['C'] = df['B'].apply(f)
print (df)
                   A           B        C
0              54321         NaN      NaN
1        it is 54322     [54322]  54322.0
2  is it 54323 or 4?  [54323, 4]  54323.0
3                NaN         NaN      NaN

Or use Series.str.extractall for MultiIndex with convert to int and using max per first level:

df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
df['C'] = df.A.astype(str).str.extractall('(\d+)').astype(int).max(level=0)
print (df)
                   A        C
0              54321  54321.0
1        it is 54322  54322.0
2  is it 54323 or 4?  54323.0
3                NaN      NaN

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195468

Another solution:

import re
df['B'] = df['A'].apply(lambda x: pd.Series(re.findall(r'\d+', str(x))).astype(float).max())
print(df)

Prints:

                   A        B
0              54321  54321.0
1        it is 54322  54322.0
2  is it 54323 or 4?  54323.0
3                NaN      NaN

Upvotes: 1

Related Questions