thealchemist
thealchemist

Reputation: 407

Pandas: solve a crosstab issue

I have a situation where a user belongs to multiple categories:

UserID    Category
1         A
1         B
2         A
3         A
4         C
2         C
4         A

A = 1,2,3,4 
B = 1
C = 2,4

I want the crosstab which shows data like this using pandas:

  A  B  C
A 4  1  2

B 1  2  0

C 2  0  2

I try:

df.groupby(UserID).agg(countDistinct('Category'))

I did the above but it returns 0 for elements not on the diagonal.

Upvotes: 3

Views: 275

Answers (1)

jezrael
jezrael

Reputation: 862741

You can first create DataFrame from lists a, b, c. Then stack and merge it to original. Last use crosstab:

a = [1,2,3,4]
b = [1]
c = [2,4]

df1 = pd.DataFrame({'A':pd.Series(a), 'B':pd.Series(b), 'C':pd.Series(c)})
print (df1)
   A    B    C
0  1  1.0  2.0
1  2  NaN  4.0
2  3  NaN  NaN
3  4  NaN  NaN

df2 = df1.stack()
         .reset_index(drop=True, level=0)
         .reset_index(name='UserID')
         .rename(columns={'index':'newCat'})

print (df2)
  newCat  UserID
0      A     1.0
1      B     1.0
2      C     2.0
3      A     2.0
4      C     4.0
5      A     3.0
6      A     4.0

df3 = pd.merge(df, df2, on='UserID')

print (pd.crosstab(df3.newCat, df3.Category))
Category  A  B  C
newCat           
A         4  1  2
B         1  1  0
C         2  0  2

Upvotes: 2

Related Questions