overloading
overloading

Reputation: 1200

regex capture group

so I have a quick question that I cannot figure out.

I have some lines that I want to parse for example:

And what I want is to get the assigned variable name of the assignment. So, in this case I woule like to retrieve a, b, c.

I have the following regex:

match = re.search(r'\b(?:float)?(.*)(?:(\+|-|\*|\\)? =)',line)

When I print out m.group(1) it would return a, b *, c +.

I cannot figure out why it also captures the operator before =, could someone explain?

Upvotes: 3

Views: 464

Answers (2)

Inbar Rose
Inbar Rose

Reputation: 43497

i think it can be a much simpler regular expression.

first of all, your variables can only be alphanumeric, i have yet to see a variable that is any other such thing.

so already your capturing group looks like this: (\w+)

then, if the only thing that can before that is a float, it should indeed look like that \b(?:float\s+)?

but really, thats all we need.

the only thing missing is to read to the end of the line in the case of trying to read it all at once, else it is not needed if you read each line as it comes: .*\n

so your whole thing can be: \b(?:float\s+)?(\w+).*\n once the regular expression reaches a non-alphanumeric, such as a space, an '=' sign, or anythign else, it will stop being part of the capture group.

:)

running the regex i mentioned on your example:

>>> import re
>>> re.findall(r'\b(?:float\s+)?(\w+).*\n', "a = a/2;\nb*= a/4*2;\nfloat c += 4*2*sin(2);\n")
['a', 'b', 'c']

and running each line at a time: ( ^ tells the regular expression to start at the beginning of the string. )

>>> re.findall(r'^(?:float\s+)?(\w+)', "a = a/2")
['a']
>>> re.findall(r'^(?:float\s+)?(\w+)', "b*= a/4*2")
['b']
>>> re.findall(r'^(?:float\s+)?(\w+)', "float c += 4*2*sin(2)")
['c']

Upvotes: 0

newfurniturey
newfurniturey

Reputation: 38456

You have a preceding greedy capture with the (.*) and you're allowing your operator-capture to be optional (with the ending ?); With this, the greedy-capture is the one that's bringing in the operator instead of letting it fall-through to the group matching the =.

Try changing the greedy-capture to be only what is acceptable there. From the looks of it, it could only be alpha-numeric values and spaces (numeric is a guess, so that could be dropped if not needed):

\b(?:float\s+)?([a-zA-Z0-9]+)\s*(?:(\+|-|\*|\\)? =)

Upvotes: 2

Related Questions