Reputation: 1488

How to find number of unique values per a key in python dictionary

I have a dictionary as,

{'drink': ["'57 Chevy with a White License Plate",
  "'57 Chevy with a White License Plate",
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '110 in the shade',
  '110 in the shade',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '155 Belmont',
  '155 Belmont',
  '155 Belmont',
  '155 Belmont',
  '24k nightmare',
  '24k nightmare',
  '24k nightmare',
  '24k nightmare',
  '252',
  '252'],
 'ingredient': ['Creme de Cacao',
  'Vodka',
  'Absolut Kurant',
  'Grand Marnier',
  'Grand Marnier',
  'Midori melon liqueur',
  'Malibu rum',
  'Amaretto',
  'Cranberry juice',
  'Pineapple juice',
  'Lager',
  'Tequila',
  'Malibu rum',
  'Light rum',
  '151 proof rum',
  'Dark Creme de Cacao',
  'Cointreau',
  'Milk',
  'Coconut liqueur',
  'Vanilla ice-cream',
  'Dark rum',
  'Light rum',
  'Vodka',
  'Orange juice',
  'Goldschlager',
  'Jägermeister',
  'Rumple Minze',
  '151 proof rum',
  '151 proof rum',
  'Wild Turkey']}

I would like to find the number of unique ingredients per a drink as

Drink 57 Chevy with a White License Plate has 2 unique ingredients,
Drink 1-900-FUK-MEUP has 7 unique ingredients('Absolut Kurant', 'Grand Marnier', 'Grand Marnier', 'Midori melon liqueur', 'Malibu rum', 'Amaretto', 'Cranberry juice', 'Pineapple juice')

out_dict = {'drink':['57 Chevy with a White License Plate','1-900-FUK-MEUP'],'unique_count':[2,7]}

Could you please write your suggestions/answers how to get it done?

Upvotes: 0

Answers (3)

myamulla_ciencia

Reputation: 1488

Thanks for your inputs, here i have finalized it and it met with my requirement.

from collections import OrderedDict
from itertools import groupby
from operator import itemgetter

pairs = list(zip(ex_dict['drink'], ex_dict['ingredient']))

ingr_per_drink = {k : list(map(itemgetter(1), v)) 
                  for k,v in groupby(sorted(pairs, key=itemgetter(0)), key=itemgetter(0))}

drink_counts = {drink:len(set(ingr)) for drink,ingr in ingr_per_drink.items()}

drink_group_sum = {'drink':[],'unique_ingr':[]}

for k, v in drink_counts.items():
    drink_group_sum['drink'].append(k)
    drink_group_sum['unique_ingr'].append(v)

drink_group_sum can be viewed as,

{'drink': ["'57 Chevy with a White License Plate",
  '1-900-FUK-MEUP',
  '110 in the shade',
  '151 Florida Bushwacker',
  '155 Belmont',
  '24k nightmare',
  '252'],
 'unique_ingr': [2, 7, 2, 8, 4, 4, 2]}

Now i can easily pass this dict back to datatable Frame constructor so that it will create a new datatable and can be viewed as,

Out[4]: 
   | drink                                 unique_ingr
-- + ------------------------------------  -----------
 0 | '57 Chevy with a White License Plate            2
 1 | 1-900-FUK-MEUP                                  7
 2 | 110 in the shade                                2
 3 | 151 Florida Bushwacker                          8
 4 | 155 Belmont                                     4
 5 | 24k nightmare                                   4
 6 | 252                                             2

[7 rows x 2 columns]

Upvotes: 0

azro

Reputation: 54168

First, your structure is innapropriate, let's build a better one as a dict with mappings

{drinkKey:ingredientList}

make pairs between your two list ('drink', 'ingredient')
group these on the first item, the drink, to get {drink:[('drink', 'ingredient'), ('drink', 'ingredient')]
keep ingredient only in the list

pairs = list(zip(data['drink'], data['ingredient']))
ingr_per_drink = {k : list(map(itemgetter(1), v)) 
                  for k,v in groupby(sorted(pairs, key=itemgetter(0)), key=itemgetter(0))}
for drink, ingredients in ingr_per_drink.items():
    # whatever you want

Upvotes: 2

ExplodingGayFish

Reputation: 2897

You can try something like this:

from collections import defaultdict
data = {'drink': ["'57 Chevy with a White License Plate",
  "'57 Chevy with a White License Plate",
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '1-900-FUK-MEUP',
  '110 in the shade',
  '110 in the shade',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '151 Florida Bushwacker',
  '155 Belmont',
  '155 Belmont',
  '155 Belmont',
  '155 Belmont',
  '24k nightmare',
  '24k nightmare',
  '24k nightmare',
  '24k nightmare',
  '252',
  '252'],
 'ingredient': ['Creme de Cacao',
  'Vodka',
  'Absolut Kurant',
  'Grand Marnier',
  'Grand Marnier',
  'Midori melon liqueur',
  'Malibu rum',
  'Amaretto',
  'Cranberry juice',
  'Pineapple juice',
  'Lager',
  'Tequila',
  'Malibu rum',
  'Light rum',
  '151 proof rum',
  'Dark Creme de Cacao',
  'Cointreau',
  'Milk',
  'Coconut liqueur',
  'Vanilla ice-cream',
  'Dark rum',
  'Light rum',
  'Vodka',
  'Orange juice',
  'Goldschlager',
  'Jägermeister',
  'Rumple Minze',
  '151 proof rum',
  '151 proof rum',
  'Wild Turkey']}

result = defaultdict(set)
for drink, ingredient in zip(data['drink'], data['ingredient']):
    result[drink].add(ingredient)

for drink, unique_ingredient in result.items():
    print("{} has {} unique ingredients: {}".format(drink, len(unique_ingredient), list(unique_ingredient)))

Output:

'57 Chevy with a White License Plate has 2 unique ingredients: ['Creme de Cacao', 'Vodka']
1-900-FUK-MEUP has 7 unique ingredients: ['Malibu rum', 'Grand Marnier', 'Cranberry juice', 'Pineapple juice', 'Amaretto', 'Midori melon liqueur', 'Absolut Kurant']
110 in the shade has 2 unique ingredients: ['Lager', 'Tequila']
151 Florida Bushwacker has 8 unique ingredients: ['Milk', 'Malibu rum', 'Vanilla ice-cream', 'Light rum', 'Coconut liqueur', 'Dark Creme de Cacao', 'Cointreau', '151 proof rum']
155 Belmont has 4 unique ingredients: ['Light rum', 'Orange juice', 'Dark rum', 'Vodka']
24k nightmare has 4 unique ingredients: ['Jägermeister', '151 proof rum', 'Rumple Minze', 'Goldschlager']
252 has 2 unique ingredients: ['Wild Turkey', '151 proof rum']

Upvotes: 3

How to find number of unique values per a key in python dictionary

Answers (3)

Related Questions