Reputation: 307
How can I separate
3[a]2[b4[F]c]
into 3[a]
and 2[b4[F]c]
OR
3[a]2[bb]
into 3[a]
and 2[bb]
using re.split
?
I try the following pattern:
(\d+)\[(.*?)\]
but the output gives me 3a
and 2b4[F"
.
Upvotes: 2
Views: 57
Reputation: 627044
You can't do that with re.split
since re
does not support recursion.
You may match and extract numbers that are followed with nested square brackets using PyPi regex module:
import regex
s = "3[a]2[b4[F]c]"
print( [x.group() for x in regex.finditer(r'\d+(\[(?:[^][]++|(?1))*])', s)] )
# => ['3[a]', '2[b4[F]c]']
See the online Python demo
Pattern details
\d+
- 1+ digits(\[(?:[^][]++|(?1))*])
- Group 1:
\[
- a [
char(?:[^][]++|(?1))*
- 0 or more sequences of[^][]++
- 1+ chars other than [
and ]
(possessively for better performance)|
- or(?1)
- a subroutine triggering Group 1 recursion at this location]
- a ]
char.Upvotes: 1
Reputation: 163447
If you want to use split, you might assert what is on the left is a ]
and on the right is a digit:
(?<=])(?=\d)
Example code
import re
regex = r"(?<=])(?=\d)"
strings = [
"3[a]2[b4[F]c]",
"3[a]2[bb]"
]
for s in strings:
print (re.split(r'(?<=])(?=\d)', s))
Output
['3[a]', '2[b4[F]c]']
['3[a]', '2[bb]']
Upvotes: 1