Reputation: 1631
I have a huge mess of a nested list that looks something like this, just longer:
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
Ultimately I want something that looks like this:
neat_fruit = [['watermelon',0,1.0], ['apple',0,1.0], ['pineapple',0,1.0], ['strawberry, banana',0,1.0], ['peach plum pear',0,1.0], ['orange, grape',0,1.0]]
but I'm not sure how to deal with the double quotes in the quotes and how to split the fruits from the numbers, especially with the commas separating some of the fruits. I've tried a bunch of things, but everything just seems to make it even more of a mess. Any suggestions would be greatly appreciated.
Upvotes: 1
Views: 252
Reputation: 29121
One more simple solution:
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
for i,x in enumerate(fruit_mess):
data = x[0].rstrip('\n').rsplit(',', 2)
fruit_mess[i] = [data[0], int(data[1]), float(data[2])]
Upvotes: 1
Reputation: 879919
Use the csv
module (in the standard library) to handle the double-quoted fruits with commas in their names:
import csv
import io
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
# flatten the list of lists into a string:
data='\n'.join(item[0].strip() for item in fruit_mess)
reader=csv.reader(io.BytesIO(data))
neat_fruit=[[fruit,int(num1),float(num2)] for fruit,num1,num2 in reader]
print(neat_fruit)
# [['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]
Upvotes: 6
Reputation: 336258
A regex-based solution:
>>> import re
>>> regex = re.compile(r'("[^"]*"|[^,]*),(\d+),([\d.]+)')
>>> neat_fruit = []
>>> for item in fruit_mess:
... match = regex.match(item[0])
... result = [match.group(1).strip('"'), int(match.group(2)), float(match.group(3))]
... neat_fruit.append(result)
...
>>> neat_fruit
[['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry,
banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]
Upvotes: 0