Reputation: 2573
I have a data frame as follows:
userID Correct
0 1050 F
1 1050 T
2 1050 T
3 1050 F
4 1050 F
5 1050 F
6 1050 F
7 1050 F
8 1050 F
9 1050 F
10 1051 F
11 1051 F
12 1051 F
13 1051 F
14 1051 F
15 1051 T
16 1051 F
17 1051 F
18 1051 F
19 1051 T
What I want to do is to count the number of T's for the "Correct" column for every user. That is, after we grouped the data frame by userID, I want a column that has the number of T's for that user.
Here is what I have done but it clearly is wrong:
df.groupby('userID').agg({'Correct': lambda x: (x == T).count()})
Upvotes: 3
Views: 108
Reputation: 323226
This will consider all 'F' and return 0:)
df1.groupby('userID').Correct.apply(lambda x : len(x[x=='T']))
Out[371]:
userID
1050 2
1051 0
In put data :
df1
Out[372]:
userID Correct
0 1050 F
1 1050 T
2 1050 T
3 1050 F
4 1050 F
5 1050 F
6 1050 F
7 1050 F
8 1050 F
9 1050 F
10 1051 F
11 1051 F
12 1051 F
13 1051 F
14 1051 F
15 1051 F
16 1051 F
17 1051 F
18 1051 F
19 1051 F
Upvotes: 2
Reputation: 862406
You are really close, use sum
of True
s:
df1 = df.groupby('userID').agg({'Correct': lambda x: (x == 'T').sum()})
print (df1)
Correct
userID
1050 2
1051 2
But better is first filter and then count:
df1 = df[df['Correct'] == 'T'].groupby('userID').size().to_frame('Correct')
print (df1)
Correct
userID
1050 2
1051 2
For add 0
for userID
with no T
add reindex
:
df1 = (df[df['Correct'] == 'T'].groupby('userID')
.size()
.reindex(df['userID'].unique(), fill_value=0)
.to_frame('Correct'))
print (df1)
Correct
userID
1050 2
1051 2
333 0
Upvotes: 3