dpalma
dpalma

Reputation: 510

Python Regex matching certain characters that are not between quotes

I am trying to extract tokens from a string, such that these tokens meet certain conditions. In my particular case, I want to extract symbols such as +,=,-, etc.

I have created the following regex:

reg = re.compile(r"[\{\}\(\)\[\]\.,;\+\-\*\/\&\|<>=~]")

However, when I apply:

reg.findall('x += "hello + world"')

It also matches the + between quotes, so it outputs:

['+', '=', '+']

My expected output is:

['+', '=']

My question is, how do I achieve this? Is it even possible? I have been surfing on the internet, but only found how to match everything but double quotes, and the ones like that.

Upvotes: 1

Views: 515

Answers (2)

Jan
Jan

Reputation: 43169

First, you do not need to escape every special character in a character class (letting aside [ and ]). So your initial expression becomes sth. like:

[-\[\]{}().,;+*/&|<>=~]

Now to the second requirement: matching in certain positions (and leaving some as they are). Here, you could either use the newer regex module and write (demo on regex101.com):

"[^"]+"(*SKIP)(*FAIL)|[-\[\]{}().,;+*/&|<>=~]


Or use parentheses with the older re module and some programming logic:

import re

rx = re.compile(r'"[^"]+"|([-\[\]{}().,;+*/&|<>=~])')

string = 'x += "hello + world"'

symbols = [match.group(1) for match in rx.finditer(string) if match.group(1)]
print(symbols)


Both will yield

['+', '=']


These approaches follow the mechanism:

match_this_but_dont_save_it | (keep_this)

You might want to read more on (*SKIP)(*FAIL) here.

Upvotes: 1

Aniruddh Agarwal
Aniruddh Agarwal

Reputation: 917

I think you can do one thing you can limit that once

"

Will come it will not check the regex until another occurance of

"

Comes

Upvotes: 0

Related Questions