Reputation: 125
I'm fairly new to pandas and I'm trying find out how to extract time value from a DataFrame with rows that contain same and different dates, different sellers, different states, and time stamp for each state.
I need to extract rows where State
= "received" shows up for the first time on each date from the DataFrame and for each seller.
As you can see in the screenshot above, the State "received" shows up twice on the same date for the same seller, but with different timestamp, I only need the rows where it appears for the first time.
PS. I haven't tried any code unfortunately, I searched a lot and came up empty. Any help is appreciated.
Upvotes: 0
Views: 213
Reputation: 5331
I would use:
df[df['State'] == 'Received'].sort_values(by=['Seller', 'Date', 'Time']) \
.drop_duplicates(subset=['Seller', 'Date'])
This would be mildly more performant than grouping.
Upvotes: 1
Reputation: 23146
IIUC, you need pandas.groupby
:
>>> df[df["State"]=="Received"].groupby(["Date", "Seller"])["Time"].first()
Date Seller
6/1/2021 Seller_1 8:00:36
Seller_2 9:45:30
6/2/2021 Seller_1 8:00:36
Seller_2 9:33:02
If you need earliest occurrence instead of first, you should use min()
instead.
Upvotes: 1