Reputation: 407
I have a data frame with a column that is sting with some numbers. I try to extract the largest number from that column into a separate column. My regex is working only for the very first number and wonder how can I update it to extract the largest number.
import pandas as pd
data = [['tom 11 abc 100', 10], ['nick12 text 1 1000', 15], ['juli078 aq 199 299', 14]]
df = pd.DataFrame(data, columns = ['col1', 'col2'])
df["Number"] = df['col1'].str.extract(r'(\d+(?:\.\d+)?)')
print(df)
So the output should be as follows with the new column Number.
col1 col2 Number
0 tom 11 abc 100 10 100
1 nick12 text 1 1000 15 1000
2 juli078 aq 199 299 14 299
Upvotes: 0
Views: 153
Reputation: 150765
Use extractall
to get all the digit groups, convert them to integers, then max
on the level:
# use pat = '(\d+)' of you want the digits mixed in text, e.g. `078`
pat = r'\b(\d+)\b'
df['Number'] = df['col1'].str.extractall(pat).astype(int).max(level=0)
Output:
col1 col2 Number
0 tom 11 abc 100 10 100
1 nick12 text 1 1000 15 1000
2 juli078 aq 199 299 14 299
Upvotes: 2