Reputation: 11
I have strings in the following format and I am finding it difficult to convert these kind of strings into tuples -
text = '[(Apple Fruit, 10.88), (Table Top, 1.09), (Kicks, 1.08), (La Liga, 1.05), (Camp Nou, 1.02), (Football Team, 0.82), (, 0.73), (Hattrick, 0.7), (Free kick, 0.68), (Ballon dOr, 0.6), (, 0.53), (Treble, 0.51), (Vinegar, 0.09), (Ronaldo, 0.07)]'
I want to convert this string into list of tuples -
output = [('Apple Fruit', 10.88), ('Table Top', 1.09), ('Kicks', 1.08), ('La Liga', 1.05), ('Camp Nou', 1.02), ('Football Team', 0.82), ('', 0.73), ('Hattrick', 0.7), ('Free kick', 0.68), ('Ballon dOr', 0.6), ('', 0.53), ('Treble', 0.51), ('Vinegar', 0.09), ('Ronaldo', 0.07)]
I am not sure how to do. Can someone please help me on this.
Upvotes: 1
Views: 113
Reputation: 82765
Using Regex --> Lookbehind & Lookahead
.
Ex:
import re
import ast
text = '[(Apple Fruit, 10.88), (Table Top, 1.09), (Kicks, 1.08), (La Liga, 1.05), (Camp Nou, 1.02), (Football Team, 0.82), (, 0.73), (Hattrick, 0.7), (Free kick, 0.68), (Ballon dOr, 0.6), (, 0.53), (Treble, 0.51), (Vinegar, 0.09), (Ronaldo, 0.07)]'
text = re.sub(r"(?<=\()([A-Za-z\s]+)", r'"\1"', text) #Convert letters to string
text = re.sub(r"(?<=\()(?=,)", r'""', text) #Replace empty space with empty string.
print(ast.literal_eval(text))
Output:
[('Apple Fruit', 10.88),
('Table Top', 1.09),
('Kicks', 1.08),
('La Liga', 1.05),
('Camp Nou', 1.02),
('Football Team', 0.82),
('', 0.73),
('Hattrick', 0.7),
('Free kick', 0.68),
('Ballon dOr', 0.6),
('', 0.53),
('Treble', 0.51),
('Vinegar', 0.09),
('Ronaldo', 0.07)]
Upvotes: 0
Reputation: 15872
You can try this:
import ast
text = '[(Apple Fruit, 10.88), (Table Top, 1.09), (Kicks, 1.08), (La Liga, 1.05), (Camp Nou, 1.02), (Football Team, 0.82), (, 0.73), (Hattrick, 0.7), (Free kick, 0.68), (Ballon dOr, 0.6), (, 0.53), (Treble, 0.51), (Vinegar, 0.09), (Ronaldo, 0.07)]'
comma_added = True
for char in text:
if char == '(' and comma_added:
new_text+='("'
comma_added = False
continue
if char == ',' and not comma_added:
new_text+='"'
comma_added = True
new_text += char
print(ast.literal_eval(new_text))
Output:
[('Apple Fruit', 10.88),
('Table Top', 1.09),
('Kicks', 1.08),
('La Liga', 1.05),
('Camp Nou', 1.02),
('Football Team', 0.82),
('', 0.73),
('Hattrick', 0.7),
('Free kick', 0.68),
('Ballon dOr', 0.6),
('', 0.53),
('Treble', 0.51),
('Vinegar', 0.09),
('Ronaldo', 0.07)]
Or (very ugly!!!):
new_text = text.replace('), ','},').replace('(','("').replace(', ','", ').replace('},','), ')
print(ast.literal_eval(new_text))
Upvotes: 0
Reputation: 1615
import re
regex = re.compile(r'\((.*?)\)')
text = '[(Apple Fruit, 10.88), (Table Top, 1.09), (Kicks, 1.08), (La Liga, 1.05), (Camp Nou, 1.02), (Football Team, 0.82), (, 0.73), (Hattrick, 0.7), (Free kick, 0.68), (Ballon dOr, 0.6), (, 0.53), (Treble, 0.51), (Vinegar, 0.09), (Ronaldo, 0.07)]'
pairs = regex.findall(text)
list_of_tuples = [tuple(p.split(',')) for p in pairs]
print(list_of_tuples)
text
variable and return all matches.Upvotes: 0
Reputation: 48367
You could use a convert
function which splits
the sequence and builds the list of tuples.
text = '[(Apple Fruit, 10.88), (Table Top, 1.09), (Kicks, 1.08), (La Liga, 1.05), (Camp Nou, 1.02), (Football Team, 0.82), (, 0.73), (Hattrick, 0.7), (Free kick, 0.68), (Ballon dOr, 0.6), (, 0.53), (Treble, 0.51), (Vinegar, 0.09), (Ronaldo, 0.07)]'
text = text.replace("[","").replace("]","")
def is_digit(str):
return str.lstrip('-').replace('.', '').isdigit()
def convert(in_str):
result = []
current_tuple = []
for token in in_str.split(", "):
chunk = token.replace("(","").replace(")", "")
if is_digit(chunk):
chunk = float(chunk)
current_tuple.append(chunk)
if ")" in token:
result.append(tuple(current_tuple))
current_tuple = []
return result
Output
[('Apple Fruit', 10.88), ('Table Top', 1.09), ('Kicks', 1.08), ('La Liga', 1.05), ('Camp Nou', 1.02), ('Football Team', 0.82), ('', 0.73), ('Hattrick', 0.7), ('Free kick', 0.68), ('Ballon dOr', 0.6), ('', 0.53), ('Treble', 0.51), ('Vinegar', 0.09), ('Ronaldo', 0.07)]
Upvotes: 1