sassy_rog
sassy_rog

Reputation: 1097

re.split to split expression and keep delimeters/with brackets involved

I'm working with multiple expressions that look like this C=>E or A+B+C=>D or A+B<=>C and (F|G)+H=>E. I am trying to use re.split() to split on => or <=>. Furthermore I want to also split along the 3 operators + | ^ while not touching what's inside brackets.

First attempt, I've tried this

re.split(r"<=>|=>", "A+B+C=>D")

but the problem with this is it splits a line like A+B+C=>D to

["A+B", "D"]

whereas I'm trying to achieve

["A+B", "=>", "D"]

and also with the problem regarding operators when I try to split (A+B)|C=>D like this

re.split(r"\+|=>|<=>|\^|\|", "(A+B)|C=>D")

I get

["(A", "B)", "C", "D"]

whereas I'm trying to achieve

["(A + B)", "|", "C", "=>", "D"]

I'm not very good with regex so I need help with possibly a regular expression robust enough to do this in one go. If it's not possible with regex, at least a better way of doing it.

Upvotes: 1

Views: 1149

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You may use

re.findall(r'\([^()]*\)|<?=>|[-+/*|^]|\w+', s)

See the regex demo and the Regulex graph:

enter image description here

Details

  • \([^()]*\) - a parenthesized substring
  • | - or
  • <?=> - a <=> or =>
  • | - or
  • [-+/*|^] - one of the chars defined in the character class (to match any non-word and non-whitespace char, you may replace it with [\w\s])
  • | - or
  • \w+ - word chars, 1 or more (you may precise it as you need: [A-Z]+ will match 1 or more uppercase letters, [a-zA-Z]+ will match 1+ letters)

Upvotes: 1

DeepSpace
DeepSpace

Reputation: 81614

All you need is a capture group:

import re

print(re.split(r"'(\^|=>)", "A+B+C=>D"))
# ['A+B+C', '=>', 'D']

Upvotes: 1

Related Questions