generating a dataframe that counts instances from another in pandas

Question

I have a dataframe df:

          AID   JID   CID
0         1     A     NaN
1         1     A     NaN
2         1     B     NaN
3         1     NaN   X
4         3     A     NaN
5         4     NaN   NaN
6         4     C     X
7         5     C     Y
8         5     C     X
9         6     A     NaN
10        6     B     NaN

I want to calculate how many times has AID used either JID or CID.

Resulting dataframe should be like this, where the index is the AID values and the columns are the CID and JID values:

      A     B     C    X    Y
1     2     1     0    1    0
3     1     0     0    0    0
4     0     0     1    1    0
5     0     0     2    1    1
6     1     1     0    0    0

I know how to do it by looping and counting manually. But I was wondering what the more efficient way is?

DSM · Accepted Answer

I'd melt and then use pivot_table:

In [80]: d2 = pd.melt(df, id_vars="AID")

In [81]: d2.pivot_table(index="AID", columns="value", values="variable", 
                        aggfunc="count", fill_value=0)
Out[81]: 
value  A  B  C  X  Y
AID                 
1      2  1  0  1  0
3      1  0  0  0  0
4      0  0  1  1  0
5      0  0  2  1  1
6      1  1  0  0  0

This works because melt "flattens" the dataframe into something where we can more easily access the values together, and pivot_table is for exactly the type of aggregation you have in mind:

In [90]: pd.melt(df, "AID")
Out[90]: 
   AID variable value
0    1      JID     A
1    1      JID     A
2    1      JID     B
3    1      JID   NaN
4    3      JID     A
[... skipped]
17    4      CID     X
18    5      CID     Y
19    5      CID     X
20    6      CID   NaN
21    6      CID   NaN

generating a dataframe that counts instances from another in pandas

Answers (2)

Related Questions