BKS
BKS

Reputation: 2333

generating a dataframe that counts instances from another in pandas

I have a dataframe df:

          AID   JID   CID
0         1     A     NaN
1         1     A     NaN
2         1     B     NaN
3         1     NaN   X
4         3     A     NaN
5         4     NaN   NaN
6         4     C     X
7         5     C     Y
8         5     C     X
9         6     A     NaN
10        6     B     NaN

I want to calculate how many times has AID used either JID or CID.

Resulting dataframe should be like this, where the index is the AID values and the columns are the CID and JID values:

      A     B     C    X    Y
1     2     1     0    1    0
3     1     0     0    0    0
4     0     0     1    1    0
5     0     0     2    1    1
6     1     1     0    0    0     

I know how to do it by looping and counting manually. But I was wondering what the more efficient way is?

Upvotes: 0

Views: 37

Answers (2)

DSM
DSM

Reputation: 353149

I'd melt and then use pivot_table:

In [80]: d2 = pd.melt(df, id_vars="AID")

In [81]: d2.pivot_table(index="AID", columns="value", values="variable", 
                        aggfunc="count", fill_value=0)
Out[81]: 
value  A  B  C  X  Y
AID                 
1      2  1  0  1  0
3      1  0  0  0  0
4      0  0  1  1  0
5      0  0  2  1  1
6      1  1  0  0  0

This works because melt "flattens" the dataframe into something where we can more easily access the values together, and pivot_table is for exactly the type of aggregation you have in mind:

In [90]: pd.melt(df, "AID")
Out[90]: 
   AID variable value
0    1      JID     A
1    1      JID     A
2    1      JID     B
3    1      JID   NaN
4    3      JID     A
[... skipped]
17    4      CID     X
18    5      CID     Y
19    5      CID     X
20    6      CID   NaN
21    6      CID   NaN

Upvotes: 3

jezrael
jezrael

Reputation: 862751

You can create first Series by stack and then groupby with value_counts. Last reshape by unstack:

df = df.set_index('AID').stack().groupby(level=0).value_counts().unstack(1, fill_value=0)
print (df)
     A  B  C  X  Y
AID               
1    2  1  0  1  0
3    1  0  0  0  0
4    0  0  1  1  0
5    0  0  2  1  1
6    1  1  0  0  0

Upvotes: 1

Related Questions