Reputation: 1713

Finding specific pattern of numbers using regular expression in python

I am trying to extract a specific pattern of numbers using regular expression in Python 3.7. Below are the 4 possible patterns.

Pattern 1 - The length of this pattern is exactly 10 and cannot start with a zero. These consist of only integers. Ex: '1234567890'

Pattern 2 - The length of this pattern is exactly 11 and can start with a zero. These consist of only integers. Ex: '01234567890'

Pattern 3 - The length of this pattern is exactly 11 and cannot start with a zero. There is one space after the 5th number and all other characters are numbers. Ex: '12345 67890'

Pattern 4 - The length of this pattern is exactly 12 and can start with a zero. There is one space after the 6th number and all other characters are numbers. Ex: '012345 67890'

Note - The example pattern example provided is for representation only. The actual set of numbers in my string can be anything. Example: '2345653340' or '034945 85730' or '000000 00000' or '09876543210'.

Below is what I have been trying to attempt. For some reason, they are not returning the desired results. How do I go about this?

import re

regex = re.compile(r"(\d)?\d\d\d\d\d(\b)?\d\d\d\d\d")

number1 = regex.findall("number is 1234567890") # For Pattern 1 expected output is '1234567890'
number2 = regex.findall("number is 01234567890") # For Pattern 2 expected output is '01234567890'
number3 = regex.findall("number is 12345 67890") # For Pattern 3 expected output is '12345 67890'
number4 = regex.findall("number is 012345 67890") # For Pattern 4 expected output is '012345 67890'

Upvotes: 0

Answers (3)

Giova

Reputation: 2005

Between all the regexes given til now, this one seems the easiest to write and fastest to run:

from re import compile
regex = compile(r'\d{11}|[1-9]\d{9}|[1-9]\d{4}\s\d{5}|\d{6}\s\d{5}')
number1 = regex.findall("number is 1234567890")
number2 = regex.findall("number is 01234567890")
number3 = regex.findall("number is 12345 67890") 
number4 = regex.findall("number is 012345 67890")

You get the expected results:

>>> number1
'1234567890'
>>> number2
'01234567890'
>>> number3
'12345 67890'
>>> number4
'012345 67890'

Answer from Andrej Kesely does: 80 steps. regex101.com
Answer from The fourth bird does: 44 steps. regex101.com
My answer does: 41 steps. regex101.com.

Upvotes: 1

The fourth bird

Reputation: 163577

You could use and alternation to match the different requirements. You could use a word boundary \b to prevent the number being part of a larger word.

\b(?:\d{6} \d{5}|[1-9]\d{4} \d{5}|[1-9]\d{9}|\d{11})\b

\b word boundary
(?: Non capturing group
- \d{6} \d{5} Pattern 4 6 times 0-9, space 5 times 0-9
- | Or
- [1-9]\d{4} \d{5} Pattern 3 1 time 1-9, 4 times 0-9, space, 5 times 0-9
- | Or
- [1-9]\d{9} Pattern 1 1 times 1-9, 9 times 0-9
- | Or
- \d{11} Pattern 2 11 times 0-9
) Close group
\b Word boundary

Regex demo | Python demo

Upvotes: 1

Andrej Kesely

Reputation: 195573

Regex101 (link):

import re

l = ["number is 1234567890",
"number is 01234567890",
"number is 12345 67890",
"number is 012345 67890",

"number is 912345 67890 - dont match",
"number is 02345 67890 - dont match",
"number is 91234567890 - dont match",
"number is 0234567890 - dont match"]

for s in l:
    m = re.findall(r'\b0\d{5}\s\d{5}\b|\b[1-9]\d{4}\s\d{5}\b|\b0\d{10}\b|\b[1-9]\d{9}\b', s)
    print(m)

Prints:

['1234567890']
['01234567890']
['12345 67890']
['012345 67890']
[]
[]
[]
[]

Upvotes: 1

Finding specific pattern of numbers using regular expression in python

Answers (3)

Related Questions