Reputation: 12134
I'm beginning my adventure with regular expressions. I'm interested in splitting specially formatted strings. If a letter is not inside parentheses it should become a different element of output list. Letters grouped inside parentheses should be put together.
Samples:
my string => wanted list
"ab(hpl)x"
=> ['a', 'b', 'hpl', 'x']
"(pck)(kx)(sd)"
=> ['pck', 'kx', 'sd']
"(kx)kxx(kd)"
=> ['kx', 'k', 'x', 'x', 'kd']
"fghk"
=> ['f', 'g', 'h', 'k']
How can it be achieved with regular expressions and re.split
?
Thanks in advance for your help.
Upvotes: 0
Views: 4720
Reputation: 208725
This cannot be done with re.split
, as it would require splitting on zero length matches.
From http://docs.python.org/library/re.html#re.split:
Note that split will never split a string on an empty pattern match.
Here is an alternative:
re.findall(r'(\w+(?=\))|\w)', your_string)
And an example:
>>> for s in ("ab(hpl)x", "(pck)(kx)(sd)", "(kx)kxx(kd)", "fghk"):
... print s, " => ", re.findall(r'(\w+(?=\))|\w)', s)
...
ab(hpl)x => ['a', 'b', 'hpl', 'x']
(pck)(kx)(sd) => ['pck', 'kx', 'sd']
(kx)kxx(kd) => ['kx', 'k', 'x', 'x', 'kd']
fghk => ['f', 'g', 'h', 'k']
Upvotes: 5
Reputation: 45562
You want findall
not split
. Use this re: r'(?<=\()[a-z]+(?=\))|[a-z]'
, which works for all your test cases.
>>> test_cases = ["ab(hpl)x", "(pck)(kx)(sd)", "(kx)kxx(kd)", "fghk"]
>>> pat = re.compile(r'(?<=\()[a-z]+(?=\))|[a-z]')
>>> for test_case in test_cases:
... print "%-13s => %s" % (test_case, pat.findall(test_case))
...
ab(hpl)x => ['a', 'b', 'hpl', 'x']
(pck)(kx)(sd) => ['pck', 'kx', 'sd']
(kx)kxx(kd) => ['kx', 'k', 'x', 'x', 'kd']
fghk => ['f', 'g', 'h', 'k']
edit:
Replace [a-z]
with \w
if you want to match upper and lower case letters, numbers, and underscore. You can remove the lookbehind assertion (?<=\()
if your parenthesis will never be unbalanced ("abc(def"
).
Upvotes: 1