Reputation: 1200
so I have a quick question that I cannot figure out.
I have some lines that I want to parse for example:
a = a/2;
b*= a/4*2;
float c += 4*2*sin(2);
And what I want is to get the assigned variable name of the assignment. So, in this case I woule like to retrieve a
, b
, c
.
I have the following regex:
match = re.search(r'\b(?:float)?(.*)(?:(\+|-|\*|\\)? =)',line)
When I print out m.group(1)
it would return a
, b *
, c +
.
I cannot figure out why it also captures the operator before =
, could someone explain?
Upvotes: 3
Views: 464
Reputation: 43497
i think it can be a much simpler regular expression.
first of all, your variables can only be alphanumeric, i have yet to see a variable that is any other such thing.
so already your capturing group looks like this: (\w+)
then, if the only thing that can before that is a float, it should indeed look like that \b(?:float\s+)?
but really, thats all we need.
the only thing missing is to read to the end of the line in the case of trying to read it all at once, else it is not needed if you read each line as it comes: .*\n
so your whole thing can be: \b(?:float\s+)?(\w+).*\n
once the regular expression reaches a non-alphanumeric, such as a space, an '=' sign, or anythign else, it will stop being part of the capture group.
:)
running the regex i mentioned on your example:
>>> import re
>>> re.findall(r'\b(?:float\s+)?(\w+).*\n', "a = a/2;\nb*= a/4*2;\nfloat c += 4*2*sin(2);\n")
['a', 'b', 'c']
and running each line at a time: ( ^
tells the regular expression to start at the beginning of the string. )
>>> re.findall(r'^(?:float\s+)?(\w+)', "a = a/2")
['a']
>>> re.findall(r'^(?:float\s+)?(\w+)', "b*= a/4*2")
['b']
>>> re.findall(r'^(?:float\s+)?(\w+)', "float c += 4*2*sin(2)")
['c']
Upvotes: 0
Reputation: 38456
You have a preceding greedy capture with the (.*)
and you're allowing your operator-capture to be optional (with the ending ?
); With this, the greedy-capture is the one that's bringing in the operator instead of letting it fall-through to the group matching the =
.
Try changing the greedy-capture to be only what is acceptable there. From the looks of it, it could only be alpha-numeric values and spaces (numeric is a guess, so that could be dropped if not needed):
\b(?:float\s+)?([a-zA-Z0-9]+)\s*(?:(\+|-|\*|\\)? =)
Upvotes: 2