harry04
harry04

Reputation: 962

Map frequency of occurrences in a Pandas column to a dictionary

I have a Pandas dataframe df which has these columns -

Quiz          Question         K
  1              1           XYZ-01
  1              2           XXY-07
  1              2           XXP-05
  2              1           PQR-02
  2              1           WXY-09
  2              2           PQR-09
  2              3           XXY-07
  2              3           XXP-05

I want to make a dictionary that has all unique combinations of Ks in the same Quiz as the keys, and their frequency of occurrence as the value. For eg. for the above df, dict_combinations should look something like -

{('XYZ-01', 'XXY-07'): 2, ('XXY-07', 'XXP-05'): 2, ('XYZ-01', 'XXP-05'): 1,       # for Quiz 1
 ('PQR-02', 'WXY-09'): 1, ('PQR-02', 'PQR-09'): 1, ('PQR-02', 'XXY-07'): 1, ('PQR-02', 'XYZ-01'): 1, # for Quiz 2 
 ('WXY-09', 'PQR-09'): 1, ('WXY-09', 'XXY-07'): 1, ('WXY-09', 'XXP-05'): 1, ('PQR-09', 'XXY-07'): 1,
 ('PQR-09', 'XXP-05'): 1}

How can I do this?

Upvotes: 2

Views: 485

Answers (1)

jezrael
jezrael

Reputation: 863166

Use DataFrame.dropna for remove possible missing values, then DataFrame.sort_values and create combinations of 2 per groups in lambda function, flatten nested tuples by Series.explode and last count by Series.value_counts with convert to dictionary:

from  itertools import combinations

d = (df.dropna(subset=['K'])
        .sort_values(['Quiz','K'])
        .groupby('Quiz')['K'].apply(lambda x: list(combinations(x, 2)))
        .explode()
        .value_counts()
        .to_dict()
        )
print (d)
{('XXP-05', 'XXY-07'): 2, ('WXY-09', 'XXY-07'): 1, ('PQR-02', 'PQR-09'): 1,
 ('PQR-09', 'WXY-09'): 1, ('XXY-07', 'XYZ-01'): 1, ('PQR-02', 'XXY-07'): 1,
 ('PQR-09', 'XXP-05'): 1, ('PQR-02', 'WXY-09'): 1, ('PQR-09', 'XXY-07'): 1, 
 ('WXY-09', 'XXP-05'): 1, ('PQR-02', 'XXP-05'): 1, ('XXP-05', 'XYZ-01'): 1}

Another idea is use Counter for dictionary:

from  itertools import combinations
from collections import Counter

L = [tuple(sorted(y)) for i,x in df.dropna(subset=['K']).groupby('Quiz') 
                      for y in combinations(x['K'], 2)]
d = dict(Counter(L))
print (d)
{('XXY-07', 'XYZ-01'): 1, ('XXP-05', 'XYZ-01'): 1, ('XXP-05', 'XXY-07'): 2, 
 ('PQR-02', 'WXY-09'): 1, ('PQR-02', 'PQR-09'): 1, ('PQR-02', 'XXY-07'): 1, 
 ('PQR-02', 'XXP-05'): 1, ('PQR-09', 'WXY-09'): 1, ('WXY-09', 'XXY-07'): 1, 
 ('WXY-09', 'XXP-05'): 1, ('PQR-09', 'XXY-07'): 1, ('PQR-09', 'XXP-05'): 1}

EDIT: How working explode here:

It flatten nested lists of tuples in Series:

s = (df.dropna(subset=['K'])
        .sort_values(['Quiz','K'])
        .groupby('Quiz')['K'].apply(lambda x: list(combinations(x, 2)))
        # .explode()
        # .value_counts()
        # .to_dict()
        )
print (s)
Quiz
1    [(XXP-05, XXY-07), (XXP-05, XYZ-01), (XXY-07, ...
2    [(PQR-02, PQR-09), (PQR-02, WXY-09), (PQR-02, ...
Name: K, dtype: object

s = (df.dropna(subset=['K'])
        .sort_values(['Quiz','K'])
        .groupby('Quiz')['K'].apply(lambda x: list(combinations(x, 2)))
        .explode()
        )
print (s)
Quiz
1    (XXP-05, XXY-07)
1    (XXP-05, XYZ-01)
1    (XXY-07, XYZ-01)
2    (PQR-02, PQR-09)
2    (PQR-02, WXY-09)
2    (PQR-02, XXP-05)
2    (PQR-02, XXY-07)
2    (PQR-09, WXY-09)
2    (PQR-09, XXP-05)
2    (PQR-09, XXY-07)
2    (WXY-09, XXP-05)
2    (WXY-09, XXY-07)
2    (XXP-05, XXY-07)
Name: K, dtype: object

Upvotes: 1

Related Questions