Varun Kuntal
Varun Kuntal

Reputation: 97

Get a new column with total count of all values between a specific range in a group_by df

An existing dataframe that was grouped by "u_id"

u_id  u_name   u_points   
   1      AA         12   
   1      AA         8    
   1      AA         5    
   2      BC         15   
   2      BC         9    
   2      BC         10   
   2      BC         6    
   2      BC         8  

needs to have a new column that records a count of total "u_points" between a range (say 7-12, 7 inclusive, 12 exclusive) into every row. So the new dataframe would look like so:

u_id  u_name   u_points     u_range_(7-12)
   1      AA         12                  1 
   1      AA         8                   1
   1      AA         5                   1
   2      BC         15                  3
   2      BC         9                   3
   2      BC         10                  3
   2      BC         6                   3
   2      BC         8                   3

I tried using count() but I could get a series with only those index values but I need output in all rows. Also tried transform but it didn't work with condition "between".

Upvotes: 1

Views: 74

Answers (3)

Varun Kuntal
Varun Kuntal

Reputation: 97

I found a 'low effort way' to achieve the same result

df["u_range_(7-12)"] = pd.Series(np.where(df.u_points.between(7,12-1).values, 1,0)).groupby(df2['u_id']).transform('sum')

As I am new to Pandas, personally I find this preferable as it uses 'familiar' operations than the other well written answers.

Upvotes: -1

Chris
Chris

Reputation: 16147

df.merge(df[df['u_points'].between(7,12-1)].groupby('u_id').size().to_frame(name='u_range_(7-12)').reset_index(),
         on='u_id')

Upvotes: 1

jezrael
jezrael

Reputation: 862581

Create mask by Series.ge for >= and chain by & for bitwise AND by mask by Series.lt for <, convert to numbers by Series.view or Series.astype and then use GroupBy.transform with sum for new column filled by aggregate values:

m = df['u_points'].ge(7) & df['u_points'].lt(12)

df['u_range_(7-12)'] = m.view('i1').groupby(df['u_name']).transform('sum')
#alternative
#df['u_range_(7-12)'] = m.astype('int').groupby(df['u_name']).transform('sum')
print (df)
   u_id u_name  u_points  u_range_(7-12)
0     1     AA        12               1
1     1     AA         8               1
2     1     AA         5               1
3     2     BC        15               3
4     2     BC         9               3
5     2     BC        10               3
6     2     BC         6               3
7     2     BC         8               3

Similar idea with appended new column:

m = df['u_points'].ge(7) & df['u_points'].lt(12)

df['u_range_(7-12)'] = df.assign(m = m.astype('int')).groupby('u_name')['m'].transform('sum')
print (df)
   u_id u_name  u_points  u_range_(7-12)
0     1     AA        12               1
1     1     AA         8               1
2     1     AA         5               1
3     2     BC        15               3
4     2     BC         9               3
5     2     BC        10               3
6     2     BC         6               3
7     2     BC         8               3

Upvotes: 1

Related Questions