Reputation: 9348
To extract numbers length >4, using regex from a dataframe column, I have these lines:
import pandas as pd
data = {'Company': ["0652369- INTER SUPPORT LLP, 202011",
"CIRCLE TRADING LTD 1-593616, 2020-06, 0201",
"Area Food Service Co., Ltd.-6958047, 2020-07"]}
df = pd.DataFrame(data)
df['co'] = df['Company'].str.extract('(\d+).{5,}')
print (df['co'])
Output:
0 0652369
1 1
2 6958047
It doesn't get right for the second line, which shall return '593616'.
What's the right way to write it?
Upvotes: 2
Views: 191
Reputation: 41387
Try extracting (\d{5,})
:
df['co'] = df['Company'].str.extract('(\d{5,})')
# Company co
# 0 0652369- INTER SUPPORT LLP, 202011 0652369
# 1 CIRCLE TRADING LTD 1-593616, 2020-06, 0201 593616
# 2 Area Food Service Co., Ltd.-6958047, 2020-07 6958047
Upvotes: 3