Petr Petrov
Petr Petrov

Reputation: 4452

Regex: findall all substrings in text

I have a long text, it's part of them

C: state name of the Company in Russian: [03_SNYuLOOO IC "Story Group".]
). - [04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, 
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B].

I need to findall all substrings like this:

[03_SNYuLOOO IC "Story Group".]
[04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, 
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B]

I try to use

re.findall(r'^\[\d{2}_[\s\S]+\]$', text)

But it returns empty list. What do I wrong?

Upvotes: 1

Views: 109

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627507

The ^ and $ anchors require the whole string to match the pattern and [\s\S]+ match any 1+ chars as many as possible, grabbing any [ and ] on its way to the end of string, so the final ] will match the rightmost ] in the string.

You may use the following regex:

r'\[\d{2}_[^]]+]'

See the regex demo

Details

  • \[ - a literal [
  • \d{2} - two digits
  • _ - an underscore
  • [^]]+ - one or more chars other than ]
  • ] - a literal ].

See the Python demo:

import re
s='''C: state name of the Company in Russian: [03_SNYuLOOO IC "Story Group".]
). - [04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, 
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B].'''
print(re.findall(r'\[\d{2}_[^]]+]', s))
# => ['[03_SNYuLOOO IC "Story Group".]', '[04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, \nul. Krasnobogatyrskaya, 2, is built.\n2, floor 3. com. 11. Office B]']

Upvotes: 2

Related Questions