ianux22
ianux22

Reputation: 405

find when a value is above threshold and store the result in a list in pandas

here is my problem. I have a df like this one :

user_id   profile   item   level   amount   cumulative_amount
  1         1         1     1        10        10
  1         1         1     2        30        40
  1         1         2     1        10        10
  1         1         2     2        10        20
  1         1         2     3        20        40
  1         1         3     1        40        40
  1         1         4     1        20        20
  1         1         4     2        20        40
  2         1         1     1        10        10
  2         1         1     5        30        40
  2         1         2     1        10        10
  2         1         2     2        10        20
  2         1         2     6        20        40
  2         1         3     6        40        40
  2         1         4     1        20        20
  2         1         4     3        20        40

For each item, user_id and profile I need to know what is the level when the cumulative amount is above a certain threshold (ex:40), and store the result in a list of lists. For example, I should have something like:

[[2, 3, 1, 2], [5,6,6,3]]

Thanks everyone for the help!

Upvotes: 0

Views: 156

Answers (1)

mozway
mozway

Reputation: 261850

IIUC, you can filter for values above or equal to threshold (40), then get the first matching level per group:

(df
 .loc[df['cumulative_amount'].ge(40)]
 .groupby(['user_id', 'profile', 'item'])
 ['level'].first()
)

output Series:

user_id  profile  item
1        1        1       2
                  2       3
                  3       1
                  4       2
2        1        1       5
                  2       6
                  3       6
                  4       3
Name: thresh, dtype: int64

Then to get a list per user_id:

out_lst = (df
 .loc[df['cumulative_amount'].ge(40)]
 .groupby(['user_id', 'profile', 'item'])
 ['level'].first()
 .groupby(level='user_id').agg(list).to_list()
)

output: [[2, 3, 1, 2], [5, 6, 6, 3]]

Upvotes: 2

Related Questions