Harsh Bafna
Harsh Bafna

Reputation: 2224

Python regex : issues in skipping delimiter between quotes

I am new to regex and trying to split on the basis of (and/or) as delimiters

I used the solution provided in : https://stackoverflow.com/a/18893443/5164936

and modified my regex as :

re.split(r'(\s+and\s+|\s+or\s+)(?=(?:[^"]*"[^"]*")*[^"]*$)', s)

which works like a charm for majority of my use cases except for following input:

'col1 == "val1" or col2 == \'val1 and " val2\''

the split fails for this particular case and I have tried modifying the above regex with different combination with no luck. Can someone please help fix this regex.

Upvotes: 1

Views: 99

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626952

You may use a PyPi regex based solution:

import regex

s = 'col1 == "val1" or col2 == \'val1 and " val2\''
res = regex.split(r'''(?V1)(?:"[^"]*"|'[^']*')\K|(\s+(?:and|or)\s+)''', s)
print([x for x in res if x])
# => ['col1 == "val1"', ' or ', 'col2 == \'val1 and " val2\'']

See the Python demo online.

Details

  • (?V1) - flag that allows splitting at zero length matches
  • (?:"[^"]*"|'[^']*')\K - a substring in between double or single quotation marks that is discarded from the match value using the \K match reset operator (thus, when this pattern matches, the match is an empty string)
  • | - or
  • (\s+(?:and|or)\s+) - 1+ whitespaces, and or or and again 1+ whitespaces.

Upvotes: 1

Related Questions