Reputation: 35
I want to create a cross-tab with the following table:
id: category:
ab12345 dog
ab12345 dog
fhakeiik cat
901 cat
dfds1 cat
ab12345 cat
12345 mouse
dfds1 mouse
I have tried doing a on this, but it doesn't work. Does anyone have any suggestions?
Upvotes: 1
Views: 114
Reputation: 260790
You can use set intersection of all the combinations of groups
from itertools import product
s = df.groupby('category')['id'].agg(frozenset)
idx = pd.MultiIndex.from_product([s.index]*2)
df2 = pd.Series([1-len(a-b)/len(a) for a,b in product(s, repeat=2)], index=idx).unstack(0)
Output:
category cat dog mouse
category
cat 1.00 1.0 0.5
dog 0.25 1.0 0.0
mouse 0.25 0.0 1.0
import seaborn as sns
ax = sns.heatmap(df2, vmin=0, vmax=1, annot=True)
Upvotes: 1