Mark K
Mark K

Reputation: 9348

Regex to extract numbers length >4, from a dataframe column

To extract numbers length >4, using regex from a dataframe column, I have these lines:

import pandas as pd

data = {'Company': ["0652369- INTER SUPPORT LLP, 202011",
"CIRCLE TRADING LTD 1-593616, 2020-06, 0201",
"Area  Food Service Co., Ltd.-6958047, 2020-07"]}

df = pd.DataFrame(data)

df['co'] = df['Company'].str.extract('(\d+).{5,}')
print (df['co'])

Output:

0    0652369
1          1
2    6958047

It doesn't get right for the second line, which shall return '593616'.

What's the right way to write it?

Upvotes: 2

Views: 191

Answers (1)

tdy
tdy

Reputation: 41387

Try extracting (\d{5,}):

df['co'] = df['Company'].str.extract('(\d{5,})')

#                                          Company       co
# 0             0652369- INTER SUPPORT LLP, 202011  0652369
# 1     CIRCLE TRADING LTD 1-593616, 2020-06, 0201   593616
# 2  Area  Food Service Co., Ltd.-6958047, 2020-07  6958047

Upvotes: 3

Related Questions