fcimeson
fcimeson

Reputation: 123

Nested regular expression strings

I'm having trouble writing a regular expression for the following. I have a vector of literals (see RE_LIT) and I would like to find all the vectors in a line of text but I'm having difficulty writing the regular expression. Specifically I seem to have issues with the parenthesis acting as groups and not parenthesis.

RE_LABEL1 = r'[cvx]\d+(?![.]r)$'
RE_LABEL2 = r'v\d+\.r\d+'
RE_LABEL = r'(%s)|(%s)' % (RE_LABEL1, RE_LABEL2)
RE_LIT = r'!?%s' % RE_LABEL
RE_VEC = r'\[\s*(\s*%s\s*,?\s*)+\s*\]' % RE_LIT

Example string to match:

test = 'c1 = blah([v3,v4,v5.r1,!v6,v7,x8,v9,v10], [v1, v2], [x5.r1])'

Expected results:

> print re.findall(RE_VEC, test)
['[v3,v4,v5.r1,!v6,v7,x8,v9,v10]', '[v1, v2]']

Thank you ahead of time for your help.

Upvotes: 2

Views: 108

Answers (2)

fcimeson
fcimeson

Reputation: 123

import re
RE_LABEL1 = r'[cvx]\d+(?=[ ,\]])'
RE_LABEL2 = r'v\d+\.r\d+(?=[ ,\]])'
RE_LABEL = r'%s|%s' % (RE_LABEL1, RE_LABEL2)
RE_LIT = r'\!?%s' % RE_LABEL
RE_VEC = r'\[\s*(?:(?:\s*%s\s*\s*),?)+\s*\]' % RE_LIT
test = '[v3,v4,v5.r1,!v6,v7,x8,v9,v10], [v1, v2], [v1, x2.r2]'
print re.findall(RE_VEC, test)

Thank you stribizhev for your help, it got me half of the way there. The above is the finial solution.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

You can use the following fix:

import re
RE_LABEL1 = r'[cvx]\d+(?![.]r)'
RE_LABEL2 = r'v\d+\.r\d+'
RE_LABEL = r'%s|%s' % (RE_LABEL1, RE_LABEL2)
RE_LIT = r'\!?%s),?\s*' % RE_LABEL
RE_VEC = r'(?:(?:%s)+' % RE_LIT
test = '[v3,v4,v5.r1,!v6,v7,x8,v9,v10], [v1, v2]'
print re.findall(RE_VEC, test)

Output of an IDEONE demo:

['v3,v4,v5.r1,!v6,v7,x8,v9,v10', 'v1, v2']

Upvotes: 1

Related Questions