HimanAB
HimanAB

Reputation: 2573

Count the number of certain values in a data frame after group by

I have a data frame as follows:

    userID  Correct
0   1050    F
1   1050    T
2   1050    T
3   1050    F
4   1050    F
5   1050    F
6   1050    F
7   1050    F
8   1050    F
9   1050    F
10  1051    F
11  1051    F
12  1051    F
13  1051    F
14  1051    F
15  1051    T
16  1051    F
17  1051    F
18  1051    F
19  1051    T

What I want to do is to count the number of T's for the "Correct" column for every user. That is, after we grouped the data frame by userID, I want a column that has the number of T's for that user.

Here is what I have done but it clearly is wrong:

df.groupby('userID').agg({'Correct': lambda x: (x == T).count()})

Upvotes: 3

Views: 108

Answers (2)

BENY
BENY

Reputation: 323226

This will consider all 'F' and return 0:)

df1.groupby('userID').Correct.apply(lambda x : len(x[x=='T']))

Out[371]: 
userID
1050    2
1051    0

In put data :

df1
Out[372]: 
    userID Correct
0     1050       F
1     1050       T
2     1050       T
3     1050       F
4     1050       F
5     1050       F
6     1050       F
7     1050       F
8     1050       F
9     1050       F
10    1051       F
11    1051       F
12    1051       F
13    1051       F
14    1051       F
15    1051       F
16    1051       F
17    1051       F
18    1051       F
19    1051       F

Upvotes: 2

jezrael
jezrael

Reputation: 862406

You are really close, use sum of Trues:

df1 = df.groupby('userID').agg({'Correct': lambda x: (x == 'T').sum()})
print (df1)
        Correct
userID         
1050          2
1051          2

But better is first filter and then count:

df1 = df[df['Correct'] == 'T'].groupby('userID').size().to_frame('Correct')
print (df1)
        Correct
userID         
1050          2
1051          2

For add 0 for userID with no T add reindex:

df1 = (df[df['Correct'] == 'T'].groupby('userID')
                              .size()
                              .reindex(df['userID'].unique(), fill_value=0)
                              .to_frame('Correct'))
print (df1)
        Correct
userID         
1050          2
1051          2
333           0

Upvotes: 3

Related Questions