Reputation: 61
I have a dictionary whose IDs are recipe IDs and values are lists of ingredients:
recipe_dictionary = { 134: ['salt', 'chicken', 'tomato paste canned'],
523: ['toast whole grain', 'feta cheese' 'egg', 'salt'],
12: ['chicken', 'rice', 'parsley']}
I also have a static list that contains ingredients that I don't want to repeat during the day:
non_repeatable_ingredients = ['egg', 'chicken', 'beef']
Right now I go through each value of the dictionary, then loop through ingredient names, compare each name to the non_repeatable_ingredients list and I create a list of the shared words. So my reduced size dictionary would look like:
reduced_recipe_dictionary = { 134: ['chicken'],
523, ['egg'],
12: ['chicken']
This process takes a long time because my real dictionaries and ingredients lists are long. Is there a faster way to do this than the one below?
This is the get_reduced_meal_plans_dictionry method:
reduced_meal_plans_dictionary = {}
# For each recipe
for recipe in meal_plans_dictionary:
# Temp list for overlapp ingredients found for each recipe
overlapped_ingredients_list = []
# For each complete name of ingredient in the recipe
for ingredient_complete_name in meal_plans_dictionary[recipe]:
# Clean up the ingredient name as it sometimes involves comma, parentheses or spaces
ingredient_string = ingredient_complete_name.replace(',', '').replace('(', '').replace(')', '').lower().strip()
# Compare each ingredient name against the list of ingredients that shall not repeated in a day
for each in PROTEIN_TAGS:
# Compute the partial similarity
partial_similarity = fuzz.partial_ratio(ingredient_string, each.lower())
# If above 90, means one of the ingredients in the PROTEIN_TAGS exists in this recipe
if partial_similarity > 90:
# Make a list of such ingredients for this recipe
overlapped_ingredients_list.append(each.lower())
# Place the recipe ID as the key and the reduced overlapped list as the value
reduced_meal_plans_dictionary[recipe] = overlapped_ingredients_list
I am using replace and similarity ratio because ingredient names are not as clean my example; for example, I could have egg or boiled egg as one ingredient.
Thank you.
Upvotes: 0
Views: 521
Reputation: 12015
>>> reduced_recipe_dictionary = {k: list(filter(lambda x: x in non_repeatable_ingredients, v)) for k,v in recipe_dictionary.items()}
>>> reduced_recipe_dictionary
{134: ['chicken'], 523: ['egg'], 12: ['egg']}
>>>
If you dont have clean ingredients that does match the items in non_repeatable_ingredients
list, you can use fuzz.partial_ratio
from fuzzywuzzy
module to get the ingredient the closely matches (the ones with ratio greater than say, 80%). Do pip install fuzzywuzzy
to install it before hand
>>> from fuzzywuzzy import fuzz
>>> reduced_recipe_dictionary = {k: list(filter(lambda x: fuzz.partial_ratio(v,x) >80, non_repeatable_ingredients)) for k,v in recipe_dictionary.items()}
>>> reduced_recipe_dictionary
{134: ['chicken'], 523: ['egg'], 12: ['chicken']}
Upvotes: 0
Reputation: 1439
Using a combination of regex and an defaultdict, you can get exactly what you're looking for. This approach uses regex to reduce the number of for
loops needed.
Note I've adjusted key 12
to show that it will get both matches.
recipe_dictionary = { 134: ['salt', 'chicken', 'tomato paste canned'],
523: ['toast whole grain', 'feta cheese', 'egg', 'salt'],
12: ['whole chicken', 'rice', 'parsley', 'egg']}
non_repeatable_ingredients = ['egg', 'chicken', 'beef']
non_repeat = '(' + '|'.join(non_repeatable_ingredients) + ')'
d = defaultdict(list)
for k, j in recipe_dictionary.items():
for i in j:
m = re.search(non_repeat, i)
if m:
d[k].append(m.groups()[0])
d
defaultdict(list, {134: ['chicken'], 523: ['egg'], 12: ['chicken', 'egg']})
Upvotes: 0
Reputation: 599
How about using sets instead of lists, since each recipe has unique ingredients and order doesn't matter?
Sets are searchable in O(1) constant time, whereas lists are searchable in O(n) time.
For example:
recipe_dictionary = {
134: set(['salt', 'chicken', 'tomato paste canned']),
523: set(['toast whole grain', 'feta cheese' 'egg', 'salt']),
12: set(['chicken', 'rice', 'parsley'])
}
non_repeatable_ingredients = set(['egg', 'chicken', 'beef'])
You can test for an element's presence in a set like this:
for ingredient in recipe_dictionary[134]:
if ingredient in non_repeatable_ingredients:
# do something
Upvotes: 1