Regex: findall all substrings in text

Question

I have a long text, it's part of them

C: state name of the Company in Russian: [03_SNYuLOOO IC "Story Group".]
). - [04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, 
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B].

I need to findall all substrings like this:

[03_SNYuLOOO IC "Story Group".]
[04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, 
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B]

I try to use

re.findall(r'^$$\d{2}_[\s\S]+$$$', text)

But it returns empty list. What do I wrong?

Wiktor Stribiżew · Accepted Answer

The ^ and $ anchors require the whole string to match the pattern and [\s\S]+ match any 1+ chars as many as possible, grabbing any [ and ] on its way to the end of string, so the final ] will match the rightmost ] in the string.

You may use the following regex:

r'\[\d{2}_[^]]+]'

See the regex demo

Details

\[ - a literal [
\d{2} - two digits
_ - an underscore
[^]]+ - one or more chars other than ]
] - a literal ].

See the Python demo:

import re
s='''C: state name of the Company in Russian: [03_SNYuLOOO IC "Story Group".]
). - [04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, 
ul. Krasnobogatyrskaya, 2, is built.
2, floor 3. com. 11. Office B].'''
print(re.findall(r'\[\d{2}_[^]]+]', s))
# => ['[03_SNYuLOOO IC "Story Group".]', '[04_MNMestablishment of the Company: 107S64, Russian Federation, Moscow, \nul. Krasnobogatyrskaya, 2, is built.\n2, floor 3. com. 11. Office B]']

Regex: findall all substrings in text

Answers (1)

Related Questions