CiaranWelsh
CiaranWelsh

Reputation: 7681

Split a string into a list of tuples based selectively on specific commas within the string

I have a long Python string of the form:

string='Black<5,4>, Black<9,4>'

How can I split this string, and any other of arbitrary length which has the same form (i.e. <ArbitraryString1<ArbitraryListOfIntegers1>,<ArbitraryString2<ArbitraryListOfIntegers2>,...) into a list of tuples.

For example, the following would be the desired output from string:

list_of_tuples=[('Black',[5,4]),'Black,[9,4])

Usually I'd use string.split on the commas to produce a list and then regex to separate the word from the <> but since I need to use commas to delimit my indices (the contents of the <>), this doesn't work.

Upvotes: 3

Views: 896

Answers (3)

dot.Py
dot.Py

Reputation: 5157

You can split at ", " (notice the whitespace) and then proccess the data.

Example Code:

string='Black<5,4>, Black<9,4>'

splitted_string = string.split(', ')

list_of_tuples = []
for s in splitted_string:
  d = s.replace("<", " <").split()

  color = d[0]
  n1 = d[1].replace("<", "").replace(">","").split(",")[0]
  n2 = d[1].replace("<", "").replace(">","").split(",")[1]

  t = (d[0], [n1, n2])
  list_of_tuples.append(t)

print(list_of_tuples)

Output:

[('Black', ['5', '4']), ('Black', ['9', '4'])]

Upvotes: 2

ewcz
ewcz

Reputation: 13087

alternatively, you could do the splitting on commas not enclosed in <,> manually and then process the parts later:

string = 'Black<5,4>, Black<9,4>'

chunks = []
s = string + ','
N = len(s)
pos, level = 0, 0
for i in range(0, N):
    if s[i] == '<':
        level += 1

    elif s[i] == '>':
        level -= 1

    elif s[i] == ',':
        if level == 0:
            chunks.append(s[pos:i])
            pos = i+1

print(chunks)

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626774

You may use a regex to capture 1+ word chars before a < and capture everything inside <...> into another group, and then split Group 2 contents with , casting the values to int:

import re
s='Black<5,4>, Black<9,4>'
print([(x, map(int, y.split(','))) for x,y in re.findall(r'(\w+)<([^<>]+)>', s)])
# => [('Black', [5, 4]), ('Black', [9, 4])]

See the Python demo

Pattern details:

  • (\w+) - group 1 (assigned to x): 1 or more word chars
  • < - a literal <
  • ([^<>]+) - Group 2 (assigned to y): 1+ chars other than < and >
  • > - a literal >.

Upvotes: 6

Related Questions