Amit JS
Amit JS

Reputation: 133

Capture a list of integers in a string using regex python

I am trying to capture List[int] (list of integers which might be seperated by a comma) in a string. However I am not getting the expected result.

>>> txt = '''Automatic face localisation is the prerequisite step of 
facial image analysis for many applications such as facial attribute 
(e.g. expression [64] and age [38]) and facial identity
recognition [45, 31, 55, 11]. A narrow definition of face localisation 
may refer to traditional face detection [53, 62], '''

output

>>> re.findall(r'[(\b\d{1,3}\b,)+]',txt)
['(', '6', '4', '3', '8', ')', '4', '5', ',', '3', '1', ',', '5', '5', ',', '1', '1', '5', '3', ',', '6', '2', ',']

What should be the expression to capture the below output.

Expected output:

['[64]', '[38]', '[45, 31, 55, 11]', '[53, 62]']

Upvotes: 2

Views: 505

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

You can match 1-3 digits. Then repeat 0+ times matching a comma, 0+ spaces and again 1-3 digits.

\[\d{1,3}(?:, *\d{1,3})*]
  • \[ Match {
  • \d{1,3} Match 1-3 digits
  • (?: Non capture group
    • , *\d{1,3}
  • )* Close the group and repeat it 0+ times
  • ] Match ]

Regex demo | Python demo

Example

import re

txt = '''Automatic face localisation is the prerequisite step of facial image analysis for many applications such as facial attribute (e.g. expression [64] and age [38]) and facial identity
... recognition [45, 31, 55, 11]. A narrow definition of face localisation may refer to traditional face detection [53, 62],
... '''

print (re.findall(r'\[\d{1,3}(?:, *\d{1,3})*]',txt))

Output

['[64]', '[38]', '[45, 31, 55, 11]', '[53, 62]']

If there can be more digits and spaces on all sides, including continuing the sequence on a newline:

\[\s*\d+(?:\s*,\s*\d+)*\s*]

Regex demo

Upvotes: 2

user7571182
user7571182

Reputation:

You may try:

\[[\d, ]*?]

Explanation of the above regex:

Pictorial Representation

Please find the demo of the above regex in here.

Sample Implementation in python

import re

regex = r"\[[\d, ]*?]"

test_str = ("Automatic face localisation is the prerequisite step of facial image analysis for many applications such as facial attribute (e.g. expression [64] and age [38]) and facial identity\n"
    "... recognition [45, 31, 55, 11]. A narrow definition of face localisation may refer to traditional face detection [53, 62]")

print(re.findall(regex, test_str))
# Outputs: ['[64]', '[38]', '[45, 31, 55, 11]', '[53, 62]']

You can find the sample run of the above code in here.

Upvotes: 2

Related Questions