SL122
SL122

Reputation: 13

Regex (Python) - Removing a line starting with a delimiter and keeping other delimiters

How do I separate something like this:

; Remove this line  
     (?A or :B
        (G + D))

Removing the lines with ; and separating tokens by spaces (removing spaces) and '(' or ')' as delimiters but keeping them using regex in python.

The end result should be something like:

['(', '?A', 'or', ':B', '(', 'G', '+', 'D', ')', ')']

But I can't eliminate the ';' line nor separate the '(', ')' tokens as their own.

So far I have this:

re.split('[;.*]*[^()\[\]:?a-zA-Z0-9-]+', text)

Upvotes: 1

Views: 154

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may use

import re
rx = r'^;.*|([()])|\s+'
s = """; Remove this line  
     (?A or :B
        (G + D))"""
print(list(filter(None, re.split(rx, s, flags=re.M))))
# => ['(', '?A', 'or', ':B', '(', 'G', '+', 'D', ')', ')']

See the Python demo

Details

  • ^;.* - start of a line (flags=re.M will make ^ match start of lines, too) and then ; and any 0 or more chars other than line break chars
  • | - or
  • ([()]) - Capturing group 1 (once captured, the matches will be output within the resulting list): a ( or ) char
  • | - or
  • \s+ - 1+ whitespaces (not captured, hence, these matches will be left out).

Upvotes: 1

Related Questions