Reputation: 7681
I have a long Python string of the form:
string='Black<5,4>, Black<9,4>'
How can I split this string, and any other of arbitrary length which has the same form (i.e. <ArbitraryString1<ArbitraryListOfIntegers1>,<ArbitraryString2<ArbitraryListOfIntegers2>,...
) into a list of tuples.
For example, the following would be the desired output from string
:
list_of_tuples=[('Black',[5,4]),'Black,[9,4])
Usually I'd use string.split
on the commas to produce a list and then regex to separate the word from the <>
but since I need to use commas to delimit my indices (the contents of the <>
), this doesn't work.
Upvotes: 3
Views: 896
Reputation: 5157
You can split at ", "
(notice the whitespace) and then proccess the data.
Example Code:
string='Black<5,4>, Black<9,4>'
splitted_string = string.split(', ')
list_of_tuples = []
for s in splitted_string:
d = s.replace("<", " <").split()
color = d[0]
n1 = d[1].replace("<", "").replace(">","").split(",")[0]
n2 = d[1].replace("<", "").replace(">","").split(",")[1]
t = (d[0], [n1, n2])
list_of_tuples.append(t)
print(list_of_tuples)
Output:
[('Black', ['5', '4']), ('Black', ['9', '4'])]
Upvotes: 2
Reputation: 13087
alternatively, you could do the splitting on commas not enclosed in <,>
manually and then process the parts later:
string = 'Black<5,4>, Black<9,4>'
chunks = []
s = string + ','
N = len(s)
pos, level = 0, 0
for i in range(0, N):
if s[i] == '<':
level += 1
elif s[i] == '>':
level -= 1
elif s[i] == ',':
if level == 0:
chunks.append(s[pos:i])
pos = i+1
print(chunks)
Upvotes: 2
Reputation: 626774
You may use a regex to capture 1+ word chars before a <
and capture everything inside <...>
into another group, and then split Group 2 contents with ,
casting the values to int:
import re
s='Black<5,4>, Black<9,4>'
print([(x, map(int, y.split(','))) for x,y in re.findall(r'(\w+)<([^<>]+)>', s)])
# => [('Black', [5, 4]), ('Black', [9, 4])]
See the Python demo
Pattern details:
(\w+)
- group 1 (assigned to x
): 1 or more word chars<
- a literal <
([^<>]+)
- Group 2 (assigned to y
): 1+ chars other than <
and >
>
- a literal >
.Upvotes: 6