bigmacattack
bigmacattack

Reputation: 35

Creating cross tab of counts of common groups for different categories

I want to create a cross-tab with the following table:

  id:          category:
ab12345          dog
ab12345          dog
fhakeiik         cat 
901              cat
dfds1            cat
ab12345          cat
12345            mouse
dfds1            mouse

I have tried doing a on this, but it doesn't work. Does anyone have any suggestions?

Upvotes: 1

Views: 114

Answers (1)

mozway
mozway

Reputation: 260790

pairwise comparison

You can use set intersection of all the combinations of groups


from itertools import product
s = df.groupby('category')['id'].agg(frozenset)

idx = pd.MultiIndex.from_product([s.index]*2)
df2 = pd.Series([1-len(a-b)/len(a) for a,b in product(s, repeat=2)], index=idx).unstack(0)

Output:

category   cat  dog  mouse
category                  
cat       1.00  1.0    0.5
dog       0.25  1.0    0.0
mouse     0.25  0.0    1.0
heatmap
import seaborn as sns 

ax = sns.heatmap(df2, vmin=0, vmax=1, annot=True)

Output: heatmap

Upvotes: 1

Related Questions