Reputation: 405
here is my problem. I have a df like this one :
user_id profile item level amount cumulative_amount
1 1 1 1 10 10
1 1 1 2 30 40
1 1 2 1 10 10
1 1 2 2 10 20
1 1 2 3 20 40
1 1 3 1 40 40
1 1 4 1 20 20
1 1 4 2 20 40
2 1 1 1 10 10
2 1 1 5 30 40
2 1 2 1 10 10
2 1 2 2 10 20
2 1 2 6 20 40
2 1 3 6 40 40
2 1 4 1 20 20
2 1 4 3 20 40
For each item, user_id and profile I need to know what is the level when the cumulative amount is above a certain threshold (ex:40), and store the result in a list of lists. For example, I should have something like:
[[2, 3, 1, 2], [5,6,6,3]]
Thanks everyone for the help!
Upvotes: 0
Views: 156
Reputation: 261850
IIUC, you can filter for values above or equal to threshold (40), then get the first matching level per group:
(df
.loc[df['cumulative_amount'].ge(40)]
.groupby(['user_id', 'profile', 'item'])
['level'].first()
)
output Series:
user_id profile item
1 1 1 2
2 3
3 1
4 2
2 1 1 5
2 6
3 6
4 3
Name: thresh, dtype: int64
Then to get a list per user_id
:
out_lst = (df
.loc[df['cumulative_amount'].ge(40)]
.groupby(['user_id', 'profile', 'item'])
['level'].first()
.groupby(level='user_id').agg(list).to_list()
)
output: [[2, 3, 1, 2], [5, 6, 6, 3]]
Upvotes: 2