Reputation: 510
I am trying to extract tokens from a string, such that these tokens meet certain conditions. In my particular case, I want to extract symbols such as +,=,-, etc.
I have created the following regex:
reg = re.compile(r"[\{\}\(\)\[\]\.,;\+\-\*\/\&\|<>=~]")
However, when I apply:
reg.findall('x += "hello + world"')
It also matches the + between quotes, so it outputs:
['+', '=', '+']
My expected output is:
['+', '=']
My question is, how do I achieve this? Is it even possible? I have been surfing on the internet, but only found how to match everything but double quotes, and the ones like that.
Upvotes: 1
Views: 515
Reputation: 43169
First, you do not need to escape every special character in a character class (letting aside [
and ]
). So your initial expression becomes sth. like:
[-\[\]{}().,;+*/&|<>=~]
Now to the second requirement: matching in certain positions (and leaving some as they are). Here, you could either use the newer regex
module and write (demo on regex101.com):
"[^"]+"(*SKIP)(*FAIL)|[-\[\]{}().,;+*/&|<>=~]
re
module and some programming logic:
import re
rx = re.compile(r'"[^"]+"|([-\[\]{}().,;+*/&|<>=~])')
string = 'x += "hello + world"'
symbols = [match.group(1) for match in rx.finditer(string) if match.group(1)]
print(symbols)
['+', '=']
match_this_but_dont_save_it | (keep_this)
You might want to read more on (*SKIP)(*FAIL)
here.
Upvotes: 1
Reputation: 917
I think you can do one thing you can limit that once
"
Will come it will not check the regex until another occurance of
"
Comes
Upvotes: 0