Reputation: 97
An existing dataframe that was grouped by "u_id"
u_id u_name u_points
1 AA 12
1 AA 8
1 AA 5
2 BC 15
2 BC 9
2 BC 10
2 BC 6
2 BC 8
needs to have a new column that records a count of total "u_points" between a range (say 7-12, 7 inclusive, 12 exclusive) into every row. So the new dataframe would look like so:
u_id u_name u_points u_range_(7-12)
1 AA 12 1
1 AA 8 1
1 AA 5 1
2 BC 15 3
2 BC 9 3
2 BC 10 3
2 BC 6 3
2 BC 8 3
I tried using count() but I could get a series with only those index values but I need output in all rows. Also tried transform but it didn't work with condition "between".
Upvotes: 1
Views: 74
Reputation: 97
I found a 'low effort way' to achieve the same result
df["u_range_(7-12)"] = pd.Series(np.where(df.u_points.between(7,12-1).values, 1,0)).groupby(df2['u_id']).transform('sum')
As I am new to Pandas, personally I find this preferable as it uses 'familiar' operations than the other well written answers.
Upvotes: -1
Reputation: 16147
df.merge(df[df['u_points'].between(7,12-1)].groupby('u_id').size().to_frame(name='u_range_(7-12)').reset_index(),
on='u_id')
Upvotes: 1
Reputation: 862581
Create mask by Series.ge
for >=
and chain by &
for bitwise AND
by mask by Series.lt
for <
, convert to numbers by Series.view
or Series.astype
and then use GroupBy.transform
with sum
for new column filled by aggregate values:
m = df['u_points'].ge(7) & df['u_points'].lt(12)
df['u_range_(7-12)'] = m.view('i1').groupby(df['u_name']).transform('sum')
#alternative
#df['u_range_(7-12)'] = m.astype('int').groupby(df['u_name']).transform('sum')
print (df)
u_id u_name u_points u_range_(7-12)
0 1 AA 12 1
1 1 AA 8 1
2 1 AA 5 1
3 2 BC 15 3
4 2 BC 9 3
5 2 BC 10 3
6 2 BC 6 3
7 2 BC 8 3
Similar idea with appended new column:
m = df['u_points'].ge(7) & df['u_points'].lt(12)
df['u_range_(7-12)'] = df.assign(m = m.astype('int')).groupby('u_name')['m'].transform('sum')
print (df)
u_id u_name u_points u_range_(7-12)
0 1 AA 12 1
1 1 AA 8 1
2 1 AA 5 1
3 2 BC 15 3
4 2 BC 9 3
5 2 BC 10 3
6 2 BC 6 3
7 2 BC 8 3
Upvotes: 1