Reputation: 1793
I have the below dataframe df:
| Staff_ID | Join_Date | Time_Stamp |
|----------|-----------|------------|
| 1 | 3/29/2016 | 4/23/2016 |
| 1 | 3/29/2016 | 3/29/2016 |
| 1 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 |
| 2 | 5/15/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 |
| 3 | 7/24/2016 | 4/21/2016 |
I want to get the Min and Max "Time_Stamp Date" partition by "Staff_ID" such that the resultant dataframe is as follows:
| Staff_ID | Join_Date | Time_Stamp | Min_Time_Stamp | Max_Time_Stamp |
|----------|-----------|------------|----------------|----------------|
| 1 | 3/29/2016 | 4/23/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 3/29/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 6/21/2016 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 | 4/1/2016 | 5/25/2016 |
| 2 | 5/15/2016 | 5/25/2016 | 4/1/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 4/21/2016 | 4/21/2016 | 6/21/2016 |
How can I do this in Python ?
Upvotes: 5
Views: 993
Reputation: 153490
Let's use groupby
with transform
and assign
:
g = df.groupby('Staff_ID')['Time_Stamp']
df.assign(Min_Time_Stamp = g.transform(min), Max_Time_Stamp = g.transform(max))
Output:
Staff_ID Join_Date Time_Stamp Max_Time_Stamp Min_Time_Stamp
1 1 3/29/2016 4/23/2016 6/21/2016 3/29/2016
2 1 3/29/2016 3/29/2016 6/21/2016 3/29/2016
3 1 3/29/2016 6/21/2016 6/21/2016 3/29/2016
4 2 5/15/2016 4/1/2016 5/25/2016 4/1/2016
5 2 5/15/2016 5/25/2016 5/25/2016 4/1/2016
6 3 7/24/2016 6/21/2016 6/21/2016 4/21/2016
7 3 7/24/2016 6/10/2016 6/21/2016 4/21/2016
8 3 7/24/2016 4/21/2016 6/21/2016 4/21/2016
@CarlesMitjans method:
10 loops, best of 3: 33.3 ms per loop
@ScottBoston method:
100 loops, best of 3: 5.52 ms per loop
Upvotes: 2
Reputation: 4866
You can use groupby
and then merge the results:
group = df.groupby("Staff_ID", as_index=False)["Time_Stamp"]
df = pd.merge(df, group.min(), on=["Staff_ID"])
df = df.rename(columns = {"Time_Stamp_x" : "Time_Stamp", "Time_Stamp_y": "Min_Time_Stamp"})
df = pd.merge(df, group.max(), on=["Staff_ID"])
df = df.rename(columns = {"Time_Stamp_x" : "Time_Stamp", "Time_Stamp_y" : "Max_Time_Stamp"})
Result:
Join_Date Staff_ID Time_Stamp Min_Time_Stamp Max_Time_Stamp
0 3/29/2016 1 4/23/2016 3/29/2016 6/21/2016
1 3/29/2016 1 3/29/2016 3/29/2016 6/21/2016
2 3/29/2016 1 6/21/2016 3/29/2016 6/21/2016
3 5/15/2016 2 4/1/2016 4/1/2016 5/25/2016
4 5/15/2016 2 5/25/2016 4/1/2016 5/25/2016
5 7/24/2016 3 6/21/2016 4/21/2016 6/21/2016
6 7/24/2016 3 6/10/2016 4/21/2016 6/21/2016
7 7/24/2016 3 4/21/2016 4/21/2016 6/21/2016
Upvotes: 0