Allan Tanaka
Allan Tanaka

Reputation: 307

Regex Pattern using bracket '[]'

How can I separate
3[a]2[b4[F]c] into 3[a] and 2[b4[F]c]
OR
3[a]2[bb] into 3[a] and 2[bb] using re.split?

I try the following pattern:

(\d+)\[(.*?)\]

but the output gives me 3a and 2b4[F".

Upvotes: 2

Views: 57

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627044

You can't do that with re.split since re does not support recursion.

You may match and extract numbers that are followed with nested square brackets using PyPi regex module:

import regex
s = "3[a]2[b4[F]c]"
print( [x.group() for x in regex.finditer(r'\d+(\[(?:[^][]++|(?1))*])', s)] )
# => ['3[a]', '2[b4[F]c]']

See the online Python demo

Pattern details

  • \d+ - 1+ digits
  • (\[(?:[^][]++|(?1))*]) - Group 1:
    • \[ - a [ char
    • (?:[^][]++|(?1))* - 0 or more sequences of
    • [^][]++ - 1+ chars other than [ and ] (possessively for better performance)
    • | - or
    • (?1) - a subroutine triggering Group 1 recursion at this location
  • ] - a ] char.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163447

If you want to use split, you might assert what is on the left is a ] and on the right is a digit:

(?<=])(?=\d)

Regex demo | Python demo

Example code

import re

regex = r"(?<=])(?=\d)"
strings = [
    "3[a]2[b4[F]c]",
    "3[a]2[bb]"
]

for s in strings:
    print (re.split(r'(?<=])(?=\d)', s))

Output

['3[a]', '2[b4[F]c]']
['3[a]', '2[bb]']

Upvotes: 1

Related Questions