Reputation: 1591
I have dataframe in the following general format:
customer_id,transaction_dt,product,price,units
1,2004-01-02 00:00:00,thing1,25,47
1,2004-01-17 00:00:00,thing2,150,8
2,2004-01-29 00:00:00,thing2,150,25
3,2017-07-15 00:00:00,thing3,55,17
3,2016-05-12 00:00:00,thing3,55,47
4,2012-02-23 00:00:00,thing2,150,22
4,2009-10-10 00:00:00,thing1,25,12
4,2014-04-04 00:00:00,thing2,150,2
5,2008-07-09 00:00:00,thing2,150,43
5,2004-01-30 00:00:00,thing1,25,40
5,2004-01-31 00:00:00,thing1,25,22
5,2004-02-01 00:00:00,thing1,25,2
I have it sorted by the relevant fields in ascending order. Now what I am trying to figure out how to check for a criteria inside a group and create a new indicator flag for only first time it occurs. As a toy example, I am trying to figure out something like this to start:
conditions = ((df['units'] > 20) | (df['price] > 50)
df['flag'] = df[conditions].groupby(['customer_id']).transform()
Any help on how best to formulate this properly would be most welcome!
Upvotes: 0
Views: 1489
Reputation: 21264
Assuming you want the first chronological appearance of a customer_id
, within the grouping you defined, you can use query
, groupby
, and first
:
(
df.sort_values("transaction_dt")
.query("units > 20 & price > 50")
.groupby("customer_id")
.first()
)
Note: The example data you provided doesn't actually have multiple customer_id
entries for the filters you specified, but the syntax will work in either case.
Upvotes: 1