Use regex to identify 4 to 5 numbers that are (consecutive, i.e no whitespace or special characters included), without including preceding 0's

Question

I am trying to use regular expressions to identify 4 to 5 digit numbers. The code below is working effectively in all cases unless there are consecutive 0's preceding a one, two or 3 digit number. I don't want '0054','0008',or '0009' to be a match, but i would want '10354' or '10032', or '9005', or '9000' to all be matches. Is there a good way to implement this using regular expressions? Here is my current code that works for most cases except when there are preceding 0's to a series of digits less than 4 or 5 characters in length.

import re

line = 'US Machine Operations | 0054'
match = re.search(r'\d{4,5}', line)
if match is None:
    print(0)
else:
    print(int(match[0]))

Wiktor Stribiżew · Accepted Answer

You may use

(?



See the regex demo.

NOTE: In Pandas str.extract, you must wrap the part you want to be returned with a capturing group, a pair of unescaped parentheses. So, you need to use

(?


Example:

df2['num_col'] = df2.Warehouse.str.extract(r'(?


Just because you can simple use a capturing group, you may use an equivalent regex:

(?:^|\D)([1-9]\d{3,4})(?!\d)


Details


(? - no digit immediately to the left

or (?:^|\D) - start of string or non-digit char (a non-capturing group is used so that only 1 capturing group could be accommodated in the pattern and let str.extract only extract what needs extracting)
[1-9] - a non-zero digit
\d{3,4} - three or four digits
(?!\d) - no digit immediately to the right is allowed


Python demo:

import re
s = "US Machine Operations | 0054 '0054','0008',or '0009' to be a match, but i would want '10354' or '10032', or '9005', or '9000'"
print(re.findall(r'(? ['10354', '10032', '9005', '9000']

Use regex to identify 4 to 5 numbers that are (consecutive, i.e no whitespace or special characters included), without including preceding 0's

Answers (1)

Related Questions

Use regex to identify 4 to 5 numbers that are (consecutive, i.e no whitespace or special characters included), without including preceding 0&#39;s

Answers (1)

Related Questions

Use regex to identify 4 to 5 numbers that are (consecutive, i.e no whitespace or special characters included), without including preceding 0's