How to find patterns in text spanning multiple lines?

Question

I wank to look for indexed array elements that are grouped in collections (comma separated) and the search should result in something like this (see the file data example below) -

[    'foo[0]',     'foo[1]',     'foo[2]', ...,     'foo[10]']
['foobar0[0]', 'foobar0[1]', 'foobar0[2]'  ..., 'foobar0[98]']
[    'bas[0]',     'bas[1]',     'bas[2]'  ...,     'bas[99]']

I have a text file where these appear as a "collection" that span over multiple lines and the collections are grouped by {..} (as shown below) -

{foo[0], foo[1], foo[2], foo[3]...\n
foo[10]}, {fooba0[0], foobar0[1], foobar0[2],....\n
foobar0[98], foobar0[99]}, {bas[0], bas[1], bas[2]...\n
bas[99]}

The general expression I am using to search the array elements is -

re.findall('[a-z][A-Z]+[0-9]+$$[0-9]+$$', )

In yacc this would translate to something like -

array_element_token:     [a-z][A-Z]+[0-9]+$$[0-9]+$$
array_items_continued:   array_items_continued             |
                         array_element_token ',' 
arrays_items:            '{' array_items_continued array_element_token '},'

Build I am not sure how to create the recursive rule using python regular expressions.

Wiktor Stribiżew · Accepted Answer

You may use

import re

s = r"""{foo[0], foo[1], foo[2], foo[3]...\n
foo[10]}, {fooba0[0], foobar0[1], foobar0[2],....\n
foobar0[98], foobar0[99]}, {bas[0], bas[1], bas[2]...\n
bas[99]}"""
results = []
matches = re.findall(r'{[^{}]*}', s)
for m in matches:
    results.append( re.findall(r'\w+\[\d+]', m) )

See the Python demo, results are [['foo[0]', 'foo[1]', 'foo[2]', 'foo[3]', 'foo[10]'], ['fooba0[0]', 'foobar0[1]', 'foobar0[2]', 'foobar0[98]', 'foobar0[99]'], ['bas[0]', 'bas[1]', 'bas[2]', 'bas[99]']].

The {[^{}]*} regex extracts all substrings between { and }, and then \w+\[\d+] extracts all substrings that match the following sequences:

\w+ - 1+ letters, digits, _ chars
\[ - a [ char
\d+ - 1+ digits
] - a ] char.

How to find patterns in text spanning multiple lines?

Answers (1)

Related Questions