Cybernetic
Cybernetic

Reputation: 13334

Select random sample from list of dictionaries, with value condition

I have a list of dictionaries like this:

list_of_dicts = [
    {'db': 'redshift', 'table': 'metrics', 'prefix': 'abc_'},
    {'db': 'blueshift', 'table': 'colors', 'prefix': 'abc_'},
    {'db': 'orangeshift', 'table': 'people', 'prefix': 'def_'},
    {'db': 'greenshift', 'table': 'money', 'prefix': 'def_'},
    {'db': 'purpleshift', 'table': 'props', 'prefix': 'ghi_'},
    {'db': 'brownshift', 'table': 'stages', 'prefix': 'ghi_'},
...
]

How can I extract N of each prefix? For example, imagine the above list was large, and I wanted to get back a list of dicts that has 5 of each prefix. So I would get a list of 15 dicts, 5 dicts with abc_ prefix, 5 with def_ and 5 with ghi_

The expected output would be:

result = [
    {'db': 'redshift', 'table': 'metrics', 'prefix': 'abc_'},
    {'db': 'blueshift', 'table': 'colors', 'prefix': 'abc_'},
    {'db': 'orangeshift', 'table': 'people', 'prefix': 'abc_'},
    {'db': 'greenshift', 'table': 'money', 'prefix': 'abc_'},
    {'db': 'purpleshift', 'table': 'props', 'prefix': 'abc_'},
    {'db': 'redshift', 'table': 'metrics', 'prefix': 'def_'},
    {'db': 'blueshift', 'table': 'colors', 'prefix': 'def_'},
    {'db': 'orangeshift', 'table': 'people', 'prefix': 'def_'},
    {'db': 'greenshift', 'table': 'money', 'prefix': 'def_'},
    {'db': 'purpleshift', 'table': 'props', 'prefix': 'def_'},
    {'db': 'redshift', 'table': 'metrics', 'prefix': 'ghi_'},
    {'db': 'blueshift', 'table': 'colors', 'prefix': 'ghi_'},
    {'db': 'orangeshift', 'table': 'people', 'prefix': 'ghi_'},
    {'db': 'greenshift', 'table': 'money', 'prefix': 'ghi_'},
    {'db': 'purpleshift', 'table': 'props', 'prefix': 'ghi_'},
]

So 5 dicts for each distinct prefix have been randomly extracted from a large list of dicts.

Upvotes: 0

Views: 156

Answers (1)

Jean-François Fabre
Jean-François Fabre

Reputation: 140266

group the dicts with prefix value as key, and list as value with a defaultdict. Then extract n elements of each (here I sampled 2 random elements from each list), "flatten" the lists if needed with itertools.chain

import collections,random, itertools

list_of_dicts = [
    {'db': 'redshift', 'table': 'metrics', 'prefix': 'abc_'},
    {'db': 'blueshift', 'table': 'colors', 'prefix': 'abc_'},
    {'db': 'orangeshift', 'table': 'people', 'prefix': 'def_'},
    {'db': 'greenshift', 'table': 'money', 'prefix': 'def_'},
    {'db': 'purpleshift', 'table': 'props', 'prefix': 'ghi_'},
    {'db': 'brownshift', 'table': 'stages', 'prefix': 'ghi_'}
]

d = collections.defaultdict(list)
# group the dicts by prefix
for lst in list_of_dicts:
    d[lst["prefix"]].append(lst)

# pick some dicts in each group & flatten the result
# a rare case where the keys aren't important in that step
result = list(itertools.chain.from_iterable(random.sample(v,2) for v in d.values()))


print(result)

one output:

[{'db': 'redshift', 'table': 'metrics', 'prefix': 'abc_'},
  {'db': 'blueshift', 'table': 'colors', 'prefix': 'abc_'},
 {'db': 'greenshift', 'table': 'money', 'prefix': 'def_'},
  {'db': 'orangeshift', 'table': 'people', 'prefix': 'def_'}, 
{'db': 'brownshift', 'table': 'stages', 'prefix': 'ghi_'},
 {'db': 'purpleshift', 'table': 'props', 'prefix': 'ghi_'}]

Upvotes: 1

Related Questions