Reputation: 357

How to get frequency of dictionary values and map to fixed ordering of all potential values?

I have a number of dictionaries that contain 10 keys. Values are sampled from a list of elements with replacement:

elements = ['S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10']

in such a way that the frequency of each element can vary from 0 to 10.

d = {1: 'S2', 2: 'S1', 3: 'S1', 4: 'S5', 5: 'S3', 6: 'S5', 7: 'S3', 8: 'S1', 9: 'S10', 10: 'S7'}

How can I obtain a list that contains the frequency of value occurrences such as:

frequencies = [('S1', 3), ('S2', 1), ('S3', 2), ('S5', 2), ('S7', 1), ('S10', 1)]

Lastly, I want to map these to a list, where each index corresponds to 'S1', 'S2', 'S3', ..., such that the above frequencies would appear as:

[3, 1, 2, 0, 2, 0, 1, 0, 0, 1]

That is, the frequency of 'S1' is 3, the frequency of 'S2' is 1, the frequency of 'S3' is 2, etc.

Upvotes: 0

Answers (3)

Christopher Peisert

Reputation: 24134

Get frequency of values

You could use a collections.Counter to count the frequency of the dictionary values.

from collections import Counter

d = {1: 'S2', 2: 'S1', 3: 'S1', 4: 'S5', 5: 'S3', 6: 'S5', 7: 'S3', 8: 'S1', 9: 'S10', 10: 'S7'}
counter = Counter(d.values())

print(counter)

Output

Counter({'S1': 3, 'S5': 2, 'S3': 2, 'S2': 1, 'S10': 1, 'S7': 1})

Map frequencies to a list (index corresponds to sample label)

Since the sample labels could be arbitrary strings, we need to map the sample labels to the desired indices. Then we can create an empty list and fill in the data for the randomly selected samples.

elements = ['S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10']
sample_index = {}

for i, label in enumerate(elements):
    sample_index[label] = i

final_output = [0] * len(sample_index)
for sample_name, frequency in counter.items():
    final_output[sample_index[sample_name]] = frequency

print(final_output)

Output

[3, 1, 2, 0, 2, 0, 1, 0, 0, 1]

Notes

The dictionary methods are available on Counter objects, but fromkeys and update work differently. (see the docs)
In Python 3.7+, it has been declared that dictionary keys will be preserved in insertion order. In previous versions, it is better to use OrderedDict if you need this behavior.

Upvotes: 2

Xin Niu

Reputation: 635

This may not be the best solution but works:

import pandas as pd
import numpy as np
    
mydict = {1: 'S2', 2: 'S1', 3: 'S1', 4: 'S5', 5: 'S3', 6: 'S5', 7: 'S3', 8: 'S1', 9: 'S10', 10: 'S7'}

df = pd.DataFrame.from_dict(mydict, orient='index',)

df.reset_index(inplace = True)
df['S'] = df.iloc[:, 1].str.extract('(\d+)').astype(int)
df.sort_values(by='S', inplace = True)

output = [0]*max(df.S)
x = 0
for s in range(max(df.S)):
    cout = np.sum(df.S==s+1)
    output[s] = cout
print(output)

Output

[3, 1, 2, 0, 2, 0, 1, 0, 0, 1]

Upvotes: 1

Ntshembo Hlongwane

Reputation: 1051

So here is the solution that I ca up with for this problem.

Idea behind

So first thing was to loop through the dictionary and the have them place in dictionary where now you have a compact number of occurrence of each
Secondly now you loop through one with fixed occurrence per unit then append to results array
So to have the sorted in terms of S1-Sn the you have to make your variable like I did and a counter that is always changing from S1-Sn

Solution below

    dictionary = {1: 'S1', 2: 'S8', 3: 'S1', 4: 'S2', 5: 'S4', 6: 'S1', 7: 'S2', 8: 'S3', 9: 'S7', 10: 'S7'}

seenMap = {}



for i in range(len(dictionary)+1):

    if (dictionary.get(i) not in seenMap):

    
        seenMap[dictionary.get(i)] = 1

    else:
        seenMap[dictionary.get(i)] += 1


numHelper = len(seenMap) + 1
counter = 1
results = []

while counter <= numHelper:


    index = "S"+str(counter)
    occurence = seenMap.get(index)
    if (occurence == None):
        results.append(0)

    else:
        results.append(occurence)
    counter += 1


print(results)