Kalenji
Kalenji

Reputation: 407

Python - extract the largest number from the string in the column into a new column

I have a data frame with a column that is sting with some numbers. I try to extract the largest number from that column into a separate column. My regex is working only for the very first number and wonder how can I update it to extract the largest number.

import pandas as pd

data = [['tom 11 abc 100', 10], ['nick12 text 1 1000', 15], ['juli078 aq 199 299', 14]]

df = pd.DataFrame(data, columns = ['col1', 'col2'])
df["Number"] = df['col1'].str.extract(r'(\d+(?:\.\d+)?)')

print(df)

So the output should be as follows with the new column Number.

   col1               col2     Number
0  tom 11 abc 100     10       100
1  nick12 text 1 1000 15       1000
2  juli078 aq 199 299 14       299

Upvotes: 0

Views: 153

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150765

Use extractall to get all the digit groups, convert them to integers, then max on the level:

# use pat = '(\d+)' of you want the digits mixed in text, e.g. `078`
pat = r'\b(\d+)\b'
df['Number'] = df['col1'].str.extractall(pat).astype(int).max(level=0)

Output:

                 col1  col2  Number
0      tom 11 abc 100    10     100
1  nick12 text 1 1000    15    1000
2  juli078 aq 199 299    14     299

Upvotes: 2

Related Questions