Reputation: 169
I wank to look for indexed array elements that are grouped in collections (comma separated) and the search should result in something like this (see the file data example below) -
[ 'foo[0]', 'foo[1]', 'foo[2]', ..., 'foo[10]']
['foobar0[0]', 'foobar0[1]', 'foobar0[2]' ..., 'foobar0[98]']
[ 'bas[0]', 'bas[1]', 'bas[2]' ..., 'bas[99]']
I have a text file where these appear as a "collection" that span over multiple lines and the collections are grouped by {..} (as shown below) -
{foo[0], foo[1], foo[2], foo[3]...\n
foo[10]}, {fooba0[0], foobar0[1], foobar0[2],....\n
foobar0[98], foobar0[99]}, {bas[0], bas[1], bas[2]...\n
bas[99]}
The general expression I am using to search the array elements is -
re.findall('[a-z][A-Z]+[0-9]+\[[0-9]+\]', <list item>)
In yacc
this would translate to something like -
array_element_token: [a-z][A-Z]+[0-9]+\[[0-9]+\]
array_items_continued: array_items_continued |
array_element_token ','
arrays_items: '{' array_items_continued array_element_token '},'
Build I am not sure how to create the recursive rule using python regular expressions.
Upvotes: 1
Views: 147
Reputation: 627044
You may use
import re
s = r"""{foo[0], foo[1], foo[2], foo[3]...\n
foo[10]}, {fooba0[0], foobar0[1], foobar0[2],....\n
foobar0[98], foobar0[99]}, {bas[0], bas[1], bas[2]...\n
bas[99]}"""
results = []
matches = re.findall(r'{[^{}]*}', s)
for m in matches:
results.append( re.findall(r'\w+\[\d+]', m) )
See the Python demo, results are [['foo[0]', 'foo[1]', 'foo[2]', 'foo[3]', 'foo[10]'], ['fooba0[0]', 'foobar0[1]', 'foobar0[2]', 'foobar0[98]', 'foobar0[99]'], ['bas[0]', 'bas[1]', 'bas[2]', 'bas[99]']]
.
The {[^{}]*}
regex extracts all substrings between {
and }
, and then \w+\[\d+]
extracts all substrings that match the following sequences:
\w+
- 1+ letters, digits, _
chars\[
- a [
char\d+
- 1+ digits]
- a ]
char.Upvotes: 1