Reputation: 962
I have a Pandas dataframe df
which has these columns -
Quiz Question K
1 1 XYZ-01
1 2 XXY-07
1 2 XXP-05
2 1 PQR-02
2 1 WXY-09
2 2 PQR-09
2 3 XXY-07
2 3 XXP-05
I want to make a dictionary that has all unique combinations of K
s in the same Quiz as the keys, and their frequency of occurrence as the value. For eg. for the above df, dict_combinations
should look something like -
{('XYZ-01', 'XXY-07'): 2, ('XXY-07', 'XXP-05'): 2, ('XYZ-01', 'XXP-05'): 1, # for Quiz 1
('PQR-02', 'WXY-09'): 1, ('PQR-02', 'PQR-09'): 1, ('PQR-02', 'XXY-07'): 1, ('PQR-02', 'XYZ-01'): 1, # for Quiz 2
('WXY-09', 'PQR-09'): 1, ('WXY-09', 'XXY-07'): 1, ('WXY-09', 'XXP-05'): 1, ('PQR-09', 'XXY-07'): 1,
('PQR-09', 'XXP-05'): 1}
How can I do this?
Upvotes: 2
Views: 485
Reputation: 863166
Use DataFrame.dropna
for remove possible missing values, then DataFrame.sort_values
and create combinations of 2
per groups in lambda function, flatten nested tuples by Series.explode
and last count by Series.value_counts
with convert to dictionary:
from itertools import combinations
d = (df.dropna(subset=['K'])
.sort_values(['Quiz','K'])
.groupby('Quiz')['K'].apply(lambda x: list(combinations(x, 2)))
.explode()
.value_counts()
.to_dict()
)
print (d)
{('XXP-05', 'XXY-07'): 2, ('WXY-09', 'XXY-07'): 1, ('PQR-02', 'PQR-09'): 1,
('PQR-09', 'WXY-09'): 1, ('XXY-07', 'XYZ-01'): 1, ('PQR-02', 'XXY-07'): 1,
('PQR-09', 'XXP-05'): 1, ('PQR-02', 'WXY-09'): 1, ('PQR-09', 'XXY-07'): 1,
('WXY-09', 'XXP-05'): 1, ('PQR-02', 'XXP-05'): 1, ('XXP-05', 'XYZ-01'): 1}
Another idea is use Counter
for dictionary:
from itertools import combinations
from collections import Counter
L = [tuple(sorted(y)) for i,x in df.dropna(subset=['K']).groupby('Quiz')
for y in combinations(x['K'], 2)]
d = dict(Counter(L))
print (d)
{('XXY-07', 'XYZ-01'): 1, ('XXP-05', 'XYZ-01'): 1, ('XXP-05', 'XXY-07'): 2,
('PQR-02', 'WXY-09'): 1, ('PQR-02', 'PQR-09'): 1, ('PQR-02', 'XXY-07'): 1,
('PQR-02', 'XXP-05'): 1, ('PQR-09', 'WXY-09'): 1, ('WXY-09', 'XXY-07'): 1,
('WXY-09', 'XXP-05'): 1, ('PQR-09', 'XXY-07'): 1, ('PQR-09', 'XXP-05'): 1}
EDIT: How working explode
here:
It flatten nested lists of tuples in Series
:
s = (df.dropna(subset=['K'])
.sort_values(['Quiz','K'])
.groupby('Quiz')['K'].apply(lambda x: list(combinations(x, 2)))
# .explode()
# .value_counts()
# .to_dict()
)
print (s)
Quiz
1 [(XXP-05, XXY-07), (XXP-05, XYZ-01), (XXY-07, ...
2 [(PQR-02, PQR-09), (PQR-02, WXY-09), (PQR-02, ...
Name: K, dtype: object
s = (df.dropna(subset=['K'])
.sort_values(['Quiz','K'])
.groupby('Quiz')['K'].apply(lambda x: list(combinations(x, 2)))
.explode()
)
print (s)
Quiz
1 (XXP-05, XXY-07)
1 (XXP-05, XYZ-01)
1 (XXY-07, XYZ-01)
2 (PQR-02, PQR-09)
2 (PQR-02, WXY-09)
2 (PQR-02, XXP-05)
2 (PQR-02, XXY-07)
2 (PQR-09, WXY-09)
2 (PQR-09, XXP-05)
2 (PQR-09, XXY-07)
2 (WXY-09, XXP-05)
2 (WXY-09, XXY-07)
2 (XXP-05, XXY-07)
Name: K, dtype: object
Upvotes: 1