fatman13
fatman13

Reputation: 162

Python: Split string by pattern

My question is a variation to this one. I can't seem to figure this one out.

given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
expected = ["{abc, xyz}", "123", "{def, lmn, ijk}", "{uvw}", "opq"]

As in the above example, an item in the expected could be a {..., ...} or just another string.

Many thanks in advance.

Upvotes: 2

Views: 2229

Answers (4)

bpceee
bpceee

Reputation: 416

given = "{abc,{a:b}, xyz} , 123 , {def, lmn, ijk}, {uvw}, opq"
#expected = ["{abc, xyz}", "123", "{def, lmn, ijk}", "{uvw}", "opq"]
tmp_l = given.split(',')
tmp_l = [i.strip() for i in tmp_l]
result_l = []
element = ''
count = 0
for i in tmp_l:
    if i[0] == '{':
        count += 1
    if i[-1] == '}':
        count -= 1
    element = element + i + ','
    if count == 0:
        element = element[0:-1]
        result_l.append(element)
        element = ''

print str(result_l)

this one can handle nested curly bracket, although it seems not so elegant..

Upvotes: 1

Furquan Khan
Furquan Khan

Reputation: 1594

You can use the below regex to do that. Rest is same as the similar link you provided.

given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
regex = r",?\s*(\{.*?\}|[^,]+)"

print re.findall(regex,given)

OP: ['{abc, xyz}', '123', '{def, lmn, ijk}', '{uvw}', 'opq']

Just import the re module. and do the same as the link says. It will match anything inside the curly braces { } and any string.

Upvotes: 0

Douglas Denhartog
Douglas Denhartog

Reputation: 2054

Does the following not provide you with what you are looking for?

import re
given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
expected = re.findall(r'(\w+)', given)

I ran that in Terminal and got:

>>> import re
>>> given = "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
>>> expected = re.findall(r'(\w+)', given)
>>> expected
['abc', 'xyz', '123', 'def', 'lmn', 'ijk', 'uvw', 'opq']

Upvotes: 0

Xavier Combelle
Xavier Combelle

Reputation: 11195

I think the following regexp fit the job. Howevever you don't have to have nested curly bracket (nested curly bracket can't be parsed using regular expression as far as I know)

>>> s= "{abc, xyz}, 123, {def, lmn, ijk}, {uvw}, opq"
>>> re.findall(r",?\s*(\{.*?\}|[^,]+)",s)
['{abc, xyz}', '123', '{def, lmn, ijk}', '{uvw}', 'opq']

Upvotes: 3

Related Questions