Reputation: 13
I am working on a web scraper which for example returns the main_list below.
main_list = ['Energie', '375 kJ (88 kcal)', 'Vet', '0 g', 'Waarvan verzadigd', '0 g', 'Waarvan enkelvoudig onverzadigd', '0 g', 'Waarvan meervoudig onverzadigd', '0 g', 'Koolhydraten', '19 g', 'Waarvan suikers', '1 g', 'Voedingsvezel', '2 g', 'Eiwitten', '2 g', 'Zout', '0 g', 'Vitamine B6 / Pyridoxine', '0.3 mg', '21%', 'Vitamine C', '14 mg', '18%', 'Kalium/Potassium', '450 mg', '23%']
I would like to split the numeric values of the main_list into two seperate lists. like a key_list and a value_list. which could be stored in a dictionary. I can not use zip because some Keys have multiple values
enter code here
key_list=[]
for n in main_list:
if n.startswith("E"):
key_list.append(n)
if n.startswith("V"):
key_list.append(n)
if n.startswith("W"):
key_list.append(n)
if n.startswith("K"):
key_list.append(n)
if n.startswith("Z"):
key_list.append(n)
print (key_list)
which gives me the following output that I want:
['Energie', 'Vet', 'Waarvan verzadigd', 'Waarvan enkelvoudig onverzadigd', 'Waarvan meervoudig onverzadigd', 'Koolhydraten', 'Waarvan suikers', 'Voedingsvezel', 'Eiwitten', 'Zout', 'Vitamine B6 / Pyridoxine', 'Vitamine C', 'Kalium/Potassium']
I know there should be a better way of doing so, but I can not find the answer.
also tried this with:
values = "ABCDEGHIJKLMNOPQRSTUVWXYZ"
key_list =[n for n in main_list if n.startswith(values[x])]
x+=1 somewhere
Help is very much appreciated.
Upvotes: 1
Views: 68
Reputation: 5973
So if I understand correctly you just want to list any strings in the list that don't start with a digit?
Let's start by getting the first character of each string. I like to use slices instead of direct access to avoid errors if the string is empty (when that's the desired behavior).
>>> [item[:1] for item in main_list]
['E', '3', 'V', '0', 'W', '0', 'W', '0', 'W', '0', 'K', '1', 'W', '1', 'V', '2', 'E', '2', 'Z', '0', 'V', '0', '2', 'V', '1', '1', 'K', '4', '2']
Then let's check if each character is not a digit. Fortunately python strings have a useful isdigit
function.
>>> [not item[:1].isdigit() for item in main_list]
[True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, False, True, False, False, True, False, False]
However, you want to filter on this condition, not map to it, so let's change our list comprehension to reflect that.
>>> [item for item in main_list if not item[:1].isdigit()]
['Energie', 'Vet', 'Waarvan verzadigd', 'Waarvan enkelvoudig onverzadigd', 'Waarvan meervoudig onverzadigd', 'Koolhydraten', 'Waarvan suikers', 'Voedingsvezel', 'Eiwitten', 'Zout', 'Vitamine B6 / Pyridoxine', 'Vitamine C', 'Kalium/Potassium']
Upvotes: 0
Reputation: 71451
You can use re
:
import re
main_list = ['Energie', '375 kJ (88 kcal)', 'Vet', '0 g', 'Waarvan verzadigd', '0 g', 'Waarvan enkelvoudig onverzadigd', '0 g', 'Waarvan meervoudig onverzadigd', '0 g', 'Koolhydraten', '19 g', 'Waarvan suikers', '1 g', 'Voedingsvezel', '2 g', 'Eiwitten', '2 g', 'Zout', '0 g', 'Vitamine B6 / Pyridoxine', '0.3 mg', '21%', 'Vitamine C', '14 mg', '18%', 'Kalium/Potassium', '450 mg', '23%']
new_list = [i for i in main_list if not re.findall(r'\b\d+\b', i)]
Output:
['Energie', 'Vet', 'Waarvan verzadigd', 'Waarvan enkelvoudig onverzadigd', 'Waarvan meervoudig onverzadigd', 'Koolhydraten', 'Waarvan suikers', 'Voedingsvezel', 'Eiwitten', 'Zout', 'Vitamine B6 / Pyridoxine', 'Vitamine C', 'Kalium/Potassium']
Upvotes: 1