Bobmans
Bobmans

Reputation: 13

finding list items by first letter and store in two different lists and combine into dictionary

I am working on a web scraper which for example returns the main_list below.

main_list = ['Energie', '375 kJ (88 kcal)', 'Vet', '0 g', 'Waarvan verzadigd', '0 g', 'Waarvan enkelvoudig onverzadigd', '0 g', 'Waarvan meervoudig onverzadigd', '0 g', 'Koolhydraten', '19 g', 'Waarvan suikers', '1 g', 'Voedingsvezel', '2 g', 'Eiwitten', '2 g', 'Zout', '0 g', 'Vitamine B6 / Pyridoxine', '0.3 mg', '21%', 'Vitamine C', '14 mg', '18%', 'Kalium/Potassium', '450 mg', '23%']

I would like to split the numeric values of the main_list into two seperate lists. like a key_list and a value_list. which could be stored in a dictionary. I can not use zip because some Keys have multiple values

enter code here

key_list=[]
for n in main_list:
     if n.startswith("E"): 
       key_list.append(n)
     if n.startswith("V"): 
       key_list.append(n)
     if n.startswith("W"): 
       key_list.append(n)
     if n.startswith("K"):
       key_list.append(n)
     if n.startswith("Z"): 
       key_list.append(n)

print (key_list)

which gives me the following output that I want:

['Energie', 'Vet', 'Waarvan verzadigd', 'Waarvan enkelvoudig onverzadigd', 'Waarvan meervoudig onverzadigd', 'Koolhydraten', 'Waarvan suikers', 'Voedingsvezel', 'Eiwitten', 'Zout', 'Vitamine B6 / Pyridoxine', 'Vitamine C', 'Kalium/Potassium'] 

I know there should be a better way of doing so, but I can not find the answer.

also tried this with:

values = "ABCDEGHIJKLMNOPQRSTUVWXYZ" 
key_list =[n for n in main_list if n.startswith(values[x])] 
          x+=1 somewhere 

Help is very much appreciated.

Upvotes: 1

Views: 68

Answers (2)

Brett Beatty
Brett Beatty

Reputation: 5973

So if I understand correctly you just want to list any strings in the list that don't start with a digit?

Let's start by getting the first character of each string. I like to use slices instead of direct access to avoid errors if the string is empty (when that's the desired behavior).

>>> [item[:1] for item in main_list]
['E', '3', 'V', '0', 'W', '0', 'W', '0', 'W', '0', 'K', '1', 'W', '1', 'V', '2', 'E', '2', 'Z', '0', 'V', '0', '2', 'V', '1', '1', 'K', '4', '2']

Then let's check if each character is not a digit. Fortunately python strings have a useful isdigit function.

>>> [not item[:1].isdigit() for item in main_list]
[True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, False, True, False, False, True, False, False]

However, you want to filter on this condition, not map to it, so let's change our list comprehension to reflect that.

>>> [item for item in main_list if not item[:1].isdigit()]
['Energie', 'Vet', 'Waarvan verzadigd', 'Waarvan enkelvoudig onverzadigd', 'Waarvan meervoudig onverzadigd', 'Koolhydraten', 'Waarvan suikers', 'Voedingsvezel', 'Eiwitten', 'Zout', 'Vitamine B6 / Pyridoxine', 'Vitamine C', 'Kalium/Potassium']

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can use re:

import re
main_list = ['Energie', '375 kJ (88 kcal)', 'Vet', '0 g', 'Waarvan verzadigd', '0 g', 'Waarvan enkelvoudig onverzadigd', '0 g', 'Waarvan meervoudig onverzadigd', '0 g', 'Koolhydraten', '19 g', 'Waarvan suikers', '1 g', 'Voedingsvezel', '2 g', 'Eiwitten', '2 g', 'Zout', '0 g', 'Vitamine B6 / Pyridoxine', '0.3 mg', '21%', 'Vitamine C', '14 mg', '18%', 'Kalium/Potassium', '450 mg', '23%']
new_list = [i for i in main_list if not re.findall(r'\b\d+\b', i)]

Output:

['Energie', 'Vet', 'Waarvan verzadigd', 'Waarvan enkelvoudig onverzadigd', 'Waarvan meervoudig onverzadigd', 'Koolhydraten', 'Waarvan suikers', 'Voedingsvezel', 'Eiwitten', 'Zout', 'Vitamine B6 / Pyridoxine', 'Vitamine C', 'Kalium/Potassium']

Upvotes: 1

Related Questions