Maciej Ziarko
Maciej Ziarko

Reputation: 12134

How to split my strings with re.split?

I'm beginning my adventure with regular expressions. I'm interested in splitting specially formatted strings. If a letter is not inside parentheses it should become a different element of output list. Letters grouped inside parentheses should be put together.

Samples:

my string => wanted list

How can it be achieved with regular expressions and re.split? Thanks in advance for your help.

Upvotes: 0

Views: 4720

Answers (2)

Andrew Clark
Andrew Clark

Reputation: 208725

This cannot be done with re.split, as it would require splitting on zero length matches.

From http://docs.python.org/library/re.html#re.split:

Note that split will never split a string on an empty pattern match.

Here is an alternative:

re.findall(r'(\w+(?=\))|\w)', your_string)

And an example:

>>> for s in ("ab(hpl)x", "(pck)(kx)(sd)", "(kx)kxx(kd)", "fghk"):
...     print s, " => ", re.findall(r'(\w+(?=\))|\w)', s)
... 
ab(hpl)x  =>  ['a', 'b', 'hpl', 'x']
(pck)(kx)(sd)  =>  ['pck', 'kx', 'sd']
(kx)kxx(kd)  =>  ['kx', 'k', 'x', 'x', 'kd']
fghk  =>  ['f', 'g', 'h', 'k']

Upvotes: 5

Steven Rumbalski
Steven Rumbalski

Reputation: 45562

You want findall not split. Use this re: r'(?<=\()[a-z]+(?=\))|[a-z]', which works for all your test cases.

>>> test_cases = ["ab(hpl)x", "(pck)(kx)(sd)", "(kx)kxx(kd)", "fghk"]
>>> pat = re.compile(r'(?<=\()[a-z]+(?=\))|[a-z]')
>>> for test_case in test_cases:
...     print "%-13s  =>  %s" % (test_case, pat.findall(test_case))
...
ab(hpl)x       =>  ['a', 'b', 'hpl', 'x']
(pck)(kx)(sd)  =>  ['pck', 'kx', 'sd']
(kx)kxx(kd)    =>  ['kx', 'k', 'x', 'x', 'kd']
fghk           =>  ['f', 'g', 'h', 'k']

edit:

Replace [a-z] with \w if you want to match upper and lower case letters, numbers, and underscore. You can remove the lookbehind assertion (?<=\() if your parenthesis will never be unbalanced ("abc(def").

Upvotes: 1

Related Questions