Reputation: 87
I have a string while looks like this
Name: Thompson shipping co.
17VXCS947
Name: Orange juice no pulp, Price: 7, Weight: 2, Aisle:9, Shelf_life: 30,
67 Name: Orange juice pulp, Price: 7, Weight:2, Aisle:9, Shelf_life:30,
Photo is available,
Photo is available,
56GHIO098
Name: Cranberry Juice, Price: 3, Weight: 1, Aisle:9, Shelf_life:45,
Name: Lemonade, Price:1, Weight:1, Aisle:9, Shelf_life:10,
There are no new line characters and everything is one big string.
My end goal is to save them to an excel sheet. I am trying to either save these to list of lists of dictionary which looks like
[['Name: Thompson shipping co.'],['Name: Orange juice no pulp', 'Price: 7', 'Weight: 2', 'Aisle:9', 'Shelf_life: 30'],['Name: Orange juice pulp', 'Price: 7', 'Weight:2', 'Aisle:9', 'Shelf_life:30',['.....']]
or a dictionary.
My current solution is to use regex to find Name, Price, Weight, Aisle, Shelf_life with
re.findall('(?<=,)[^,]*Name:[^,]*(?=,)'),re.findall('(?<=,)[^,]*Price:[^,]*(?=,)'),re.findall('(?<=,)[^,]*Weight:[^,]*(?=,)')....
How do I save them to a list of lists or a dict? Thinking out loud, I can count the iterations and save every 5th one to new list but the first Name occurrence is a corner case.
What's the neater way to do this?
Upvotes: 1
Views: 87
Reputation: 22087
Would you please try the following:
import re
str = '''
Name: Thompson shipping co.
17VXCS947
Name: Orange juice no pulp, Price: 7, Weight: 2, Aisle:9, Shelf_life: 30,
67 Name: Orange juice pulp, Price: 7, Weight:2, Aisle:9, Shelf_life:30,
Photo is available,
Photo is available,
56GHIO098
Name: Cranberry Juice, Price: 3, Weight: 1, Aisle:9, Shelf_life:45,
Name: Lemonade, Price:1, Weight:1, Aisle:9, Shelf_life:10,
'''.replace('\n', ' ')
print([re.findall(r'\b\w+:\s*[^,]+', x) for x in re.findall(r'\bName:\s*.+?(?=\s*\bName|$)', str)])
Output:
[['Name: Thompson shipping co. 17VXCS947'], ['Name: Orange juice no pulp', 'Price: 7', 'Weight: 2', 'Aisle:9', 'Shelf_life: 30'], ['Name: Orange juice pulp', 'Price: 7', 'Weight:2', 'Aisle:9', 'Shelf_life:30'], ['Name: Cranberry Juice', 'Price: 3', 'Weight: 1', 'Aisle:9', 'Shelf_life:45'], ['Name: Lemonade', 'Price:1', 'Weight:1', 'Aisle:9', 'Shelf_life:10']]
findall()
function creates a list of strings which starts with
Name:
.findall()
function creates a list of name: value
pairs
out of the list items created above.As seen, the string 17VXCS947
is appended to the first list element.
If you want to remove it, we'll need another logic to exclude it.
Upvotes: 1