Python PANDAS: Groupby Transform First Occurrence of Condition

Question

I have dataframe in the following general format:

customer_id,transaction_dt,product,price,units
1,2004-01-02 00:00:00,thing1,25,47
1,2004-01-17 00:00:00,thing2,150,8
2,2004-01-29 00:00:00,thing2,150,25
3,2017-07-15 00:00:00,thing3,55,17
3,2016-05-12 00:00:00,thing3,55,47
4,2012-02-23 00:00:00,thing2,150,22
4,2009-10-10 00:00:00,thing1,25,12
4,2014-04-04 00:00:00,thing2,150,2
5,2008-07-09 00:00:00,thing2,150,43
5,2004-01-30 00:00:00,thing1,25,40
5,2004-01-31 00:00:00,thing1,25,22
5,2004-02-01 00:00:00,thing1,25,2

I have it sorted by the relevant fields in ascending order. Now what I am trying to figure out how to check for a criteria inside a group and create a new indicator flag for only first time it occurs. As a toy example, I am trying to figure out something like this to start:

conditions = ((df['units'] > 20) | (df['price] > 50)

df['flag'] = df[conditions].groupby(['customer_id']).transform()

Any help on how best to formulate this properly would be most welcome!

andrew_reece · Accepted Answer

Assuming you want the first chronological appearance of a customer_id, within the grouping you defined, you can use query, groupby, and first:

(
df.sort_values("transaction_dt")
  .query("units > 20 & price > 50")
  .groupby("customer_id")
  .first()
)

Note: The example data you provided doesn't actually have multiple customer_id entries for the filters you specified, but the syntax will work in either case.

Python PANDAS: Groupby Transform First Occurrence of Condition

Answers (1)

Related Questions