Reputation: 61

Python - Reduce dictionary value lists to smaller lists

I have a dictionary whose IDs are recipe IDs and values are lists of ingredients:

recipe_dictionary  = { 134: ['salt', 'chicken', 'tomato paste canned'],
                       523: ['toast whole grain', 'feta cheese' 'egg', 'salt'], 
                       12: ['chicken', 'rice', 'parsley']}

I also have a static list that contains ingredients that I don't want to repeat during the day:

non_repeatable_ingredients = ['egg', 'chicken', 'beef']

Right now I go through each value of the dictionary, then loop through ingredient names, compare each name to the non_repeatable_ingredients list and I create a list of the shared words. So my reduced size dictionary would look like:

   reduced_recipe_dictionary  = { 134: ['chicken'],
                                  523, ['egg'], 
                                  12: ['chicken']

This process takes a long time because my real dictionaries and ingredients lists are long. Is there a faster way to do this than the one below?

This is the get_reduced_meal_plans_dictionry method:

reduced_meal_plans_dictionary = {}

# For each recipe
for recipe in meal_plans_dictionary:

    # Temp list for overlapp ingredients found for each recipe
    overlapped_ingredients_list = []

    # For each complete name of ingredient in the recipe
    for ingredient_complete_name in meal_plans_dictionary[recipe]:

        # Clean up the ingredient name as it sometimes involves comma, parentheses or spaces
        ingredient_string = ingredient_complete_name.replace(',', '').replace('(', '').replace(')', '').lower().strip()

        # Compare each ingredient name against the list of ingredients that shall not repeated in a day
        for each in PROTEIN_TAGS:

            # Compute the partial similarity
            partial_similarity = fuzz.partial_ratio(ingredient_string, each.lower())

            # If above 90, means one of the ingredients in the PROTEIN_TAGS exists in this recipe
            if partial_similarity > 90:
                # Make a list of such ingredients for this recipe
                overlapped_ingredients_list.append(each.lower())

    # Place the recipe ID as the key and the reduced overlapped list as the value
    reduced_meal_plans_dictionary[recipe] = overlapped_ingredients_list

I am using replace and similarity ratio because ingredient names are not as clean my example; for example, I could have egg or boiled egg as one ingredient.

Thank you.

Upvotes: 0

Answers (3)

Sunitha

Reputation: 12015

>>> reduced_recipe_dictionary = {k: list(filter(lambda x: x in non_repeatable_ingredients, v)) for k,v in recipe_dictionary.items()}
>>> reduced_recipe_dictionary
{134: ['chicken'], 523: ['egg'], 12: ['egg']}
>>>

If you dont have clean ingredients that does match the items in non_repeatable_ingredients list, you can use fuzz.partial_ratio from fuzzywuzzy module to get the ingredient the closely matches (the ones with ratio greater than say, 80%). Do pip install fuzzywuzzy to install it before hand

>>> from fuzzywuzzy import fuzz
>>> reduced_recipe_dictionary = {k: list(filter(lambda x: fuzz.partial_ratio(v,x) >80, non_repeatable_ingredients)) for k,v in recipe_dictionary.items()}
>>> reduced_recipe_dictionary
{134: ['chicken'], 523: ['egg'], 12: ['chicken']}

Upvotes: 0

W Stokvis

Reputation: 1439

Using a combination of regex and an defaultdict, you can get exactly what you're looking for. This approach uses regex to reduce the number of for loops needed.

Note I've adjusted key 12 to show that it will get both matches.

recipe_dictionary  = { 134: ['salt', 'chicken', 'tomato paste canned'],
                        523: ['toast whole grain', 'feta cheese', 'egg', 'salt'],
                        12: ['whole chicken', 'rice', 'parsley', 'egg']}
non_repeatable_ingredients = ['egg', 'chicken', 'beef']
non_repeat = '(' + '|'.join(non_repeatable_ingredients) + ')'

d = defaultdict(list)
for k, j in recipe_dictionary.items():
     for i in j:
            m = re.search(non_repeat, i)
            if m:
                d[k].append(m.groups()[0])
d
defaultdict(list, {134: ['chicken'], 523: ['egg'], 12: ['chicken', 'egg']})

Upvotes: 0

allardbrain

Reputation: 599

How about using sets instead of lists, since each recipe has unique ingredients and order doesn't matter?

Sets are searchable in O(1) constant time, whereas lists are searchable in O(n) time.

Here are some examples.

For example:

recipe_dictionary = { 
    134: set(['salt', 'chicken', 'tomato paste canned']),
    523: set(['toast whole grain', 'feta cheese' 'egg', 'salt']), 
    12: set(['chicken', 'rice', 'parsley'])
}

non_repeatable_ingredients = set(['egg', 'chicken', 'beef'])

You can test for an element's presence in a set like this:

for ingredient in recipe_dictionary[134]:
    if ingredient in non_repeatable_ingredients:
        # do something

Upvotes: 1

Python - Reduce dictionary value lists to smaller lists

Answers (3)

Related Questions