Reputation: 357
I have a number of dictionaries that contain 10 keys. Values are sampled from a list of elements with replacement:
elements = ['S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10']
in such a way that the frequency of each element can vary from 0 to 10.
d = {1: 'S2', 2: 'S1', 3: 'S1', 4: 'S5', 5: 'S3', 6: 'S5', 7: 'S3', 8: 'S1', 9: 'S10', 10: 'S7'}
How can I obtain a list that contains the frequency of value occurrences such as:
frequencies = [('S1', 3), ('S2', 1), ('S3', 2), ('S5', 2), ('S7', 1), ('S10', 1)]
Lastly, I want to map these to a list, where each index corresponds to 'S1', 'S2', 'S3', ..., such that the above frequencies would appear as:
[3, 1, 2, 0, 2, 0, 1, 0, 0, 1]
That is, the frequency of 'S1' is 3, the frequency of 'S2' is 1, the frequency of 'S3' is 2, etc.
Upvotes: 0
Views: 559
Reputation: 24134
You could use a collections.Counter
to count the frequency of the dictionary values.
from collections import Counter
d = {1: 'S2', 2: 'S1', 3: 'S1', 4: 'S5', 5: 'S3', 6: 'S5', 7: 'S3', 8: 'S1', 9: 'S10', 10: 'S7'}
counter = Counter(d.values())
print(counter)
Output
Counter({'S1': 3, 'S5': 2, 'S3': 2, 'S2': 1, 'S10': 1, 'S7': 1})
Since the sample labels could be arbitrary strings, we need to map the sample labels to the desired indices. Then we can create an empty list and fill in the data for the randomly selected samples.
elements = ['S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10']
sample_index = {}
for i, label in enumerate(elements):
sample_index[label] = i
final_output = [0] * len(sample_index)
for sample_name, frequency in counter.items():
final_output[sample_index[sample_name]] = frequency
print(final_output)
Output
[3, 1, 2, 0, 2, 0, 1, 0, 0, 1]
Counter
objects, but fromkeys
and update
work differently. (see the docs)OrderedDict
if you need this behavior.Upvotes: 2
Reputation: 635
This may not be the best solution but works:
import pandas as pd
import numpy as np
mydict = {1: 'S2', 2: 'S1', 3: 'S1', 4: 'S5', 5: 'S3', 6: 'S5', 7: 'S3', 8: 'S1', 9: 'S10', 10: 'S7'}
df = pd.DataFrame.from_dict(mydict, orient='index',)
df.reset_index(inplace = True)
df['S'] = df.iloc[:, 1].str.extract('(\d+)').astype(int)
df.sort_values(by='S', inplace = True)
output = [0]*max(df.S)
x = 0
for s in range(max(df.S)):
cout = np.sum(df.S==s+1)
output[s] = cout
print(output)
[3, 1, 2, 0, 2, 0, 1, 0, 0, 1]
Upvotes: 1
Reputation: 1051
So here is the solution that I ca up with for this problem.
Idea behind
Solution below
dictionary = {1: 'S1', 2: 'S8', 3: 'S1', 4: 'S2', 5: 'S4', 6: 'S1', 7: 'S2', 8: 'S3', 9: 'S7', 10: 'S7'}
seenMap = {}
for i in range(len(dictionary)+1):
if (dictionary.get(i) not in seenMap):
seenMap[dictionary.get(i)] = 1
else:
seenMap[dictionary.get(i)] += 1
numHelper = len(seenMap) + 1
counter = 1
results = []
while counter <= numHelper:
index = "S"+str(counter)
occurence = seenMap.get(index)
if (occurence == None):
results.append(0)
else:
results.append(occurence)
counter += 1
print(results)
Upvotes: 1