Reputation: 407
I have a situation where a user belongs to multiple categories:
UserID Category
1 A
1 B
2 A
3 A
4 C
2 C
4 A
A = 1,2,3,4
B = 1
C = 2,4
I want the crosstab which shows data like this using pandas:
A B C
A 4 1 2
B 1 2 0
C 2 0 2
I try:
df.groupby(UserID).agg(countDistinct('Category'))
I did the above but it returns 0 for elements not on the diagonal.
Upvotes: 3
Views: 275
Reputation: 862741
You can first create DataFrame
from lists a
, b
, c
. Then stack
and merge
it to original. Last use crosstab
:
a = [1,2,3,4]
b = [1]
c = [2,4]
df1 = pd.DataFrame({'A':pd.Series(a), 'B':pd.Series(b), 'C':pd.Series(c)})
print (df1)
A B C
0 1 1.0 2.0
1 2 NaN 4.0
2 3 NaN NaN
3 4 NaN NaN
df2 = df1.stack()
.reset_index(drop=True, level=0)
.reset_index(name='UserID')
.rename(columns={'index':'newCat'})
print (df2)
newCat UserID
0 A 1.0
1 B 1.0
2 C 2.0
3 A 2.0
4 C 4.0
5 A 3.0
6 A 4.0
df3 = pd.merge(df, df2, on='UserID')
print (pd.crosstab(df3.newCat, df3.Category))
Category A B C
newCat
A 4 1 2
B 1 1 0
C 2 0 2
Upvotes: 2