Reputation: 1439
Before I start, I want to make it clear that my question is different than Counting values that meet a condition and Want to count the number of values in a column that meet a condition.
Allow me to explain. Here is my df:
test = pd.DataFrame({'name':['joe','dan','betty','joe','dan','betty','joe','dan','betty','joe','dan','betty'],'points':[12,3,5,10,5,16,2,8,15,17,1,3]})
test
name points
0 joe 12
1 dan 3
2 betty 5
3 joe 10
4 dan 5
5 betty 16
6 joe 2
7 dan 8
8 betty 15
9 joe 17
10 dan 1
11 betty 3
What I am aiming to do is count how many times each person had less than 10 points and create a new column that consists of that value. I tried the following and got really close:
test['<10'] = test[test['points'] < 10].groupby('name')['points'].transform('count')
test
name points <10
0 joe 12 NaN
1 dan 3 4.0
2 betty 5 2.0
3 joe 10 NaN
4 dan 5 4.0
5 betty 16 NaN
6 joe 2 1.0
7 dan 8 4.0
8 betty 15 NaN
9 joe 17 NaN
10 dan 1 4.0
11 betty 3 2.0
I get the values I want, but since I subset the data frame to values <10, I am left with NaN's in the rows that were excluded. I almost have this figured out, but I would like to get those NaN values filled to show how many times each person had less than 10 points (ie joe should have 1, betty 2, and dan 4). Any help is appreciated, thanks!
Upvotes: 3
Views: 8068
Reputation: 25239
Your code picks only rows where point < 10
. The logic is you need to sum
directly on the mask of point < 10
. True
is equivalent to 1
and False
is 0
. So, sum on point < 10
will return the count of True
. From this logic, you only need to groupby and do transform
to populate the count of True
to each group
test['<10'] = (test['points'] < 10).groupby(test['name']).transform('sum')
Out[84]:
name points <10
0 joe 12 1.0
1 dan 3 4.0
2 betty 5 2.0
3 joe 10 1.0
4 dan 5 4.0
5 betty 16 2.0
6 joe 2 1.0
7 dan 8 4.0
8 betty 15 2.0
9 joe 17 1.0
10 dan 1 4.0
11 betty 3 2.0
Upvotes: 4
Reputation: 323226
Fix your code remove the transform
and add reindex
test['<10']=test[test['points'] < 10].groupby('name')['points'].count().reindex(test.name).values
test
name points <10
0 joe 12 1
1 dan 3 4
2 betty 5 2
3 joe 10 1
4 dan 5 4
5 betty 16 2
6 joe 2 1
7 dan 8 4
8 betty 15 2
9 joe 17 1
10 dan 1 4
11 betty 3 2
Upvotes: 2