Reputation: 77
I have a dataframe with raw data, including pertinent data shown below, which is ordered by timestamp.
df1
Line Timestamp Depot Product
0 Line1 08:00 1 P1
1 Line1 08:10 1 P1
2 Line2 08:15 1 P4
3 Line1 08:20 1 P1
4 Line3 08:23 2 P13
5 Line1 08:30 2 P1
I have a summary tables for each production line derived from this, with pertinent data as shown below:
df2
Line Product Depot Time_Complete
0 Line 1 P1 1 NaN
1 Line 1 P1 2 NaN
2 Line 1 P2 1 NaN
3 Line 1 P2 1 NaN
What I'm trying to do is take the last timestamp from the first dataframe matching particular criteria. In this case if Line 1, P1, Depot 1 has the last time stamp would be 08:20. I'm not 100% sure where to start with this type of wrangling
Upvotes: 1
Views: 40
Reputation: 862681
I think need drop_duplicates
by specified columns and then merge
with left join:
df11 = df1.drop_duplicates(['Line','Depot','Product'], keep='last')
df = df2.merge(df11, on=['Line','Depot','Product'], how='left')
print (df)
Line Product Depot Time_Complete Datestamp
0 Line1 P1 1 NaN 08:20
1 Line1 P1 2 NaN 08:30
2 Line1 P2 1 NaN NaN
3 Line1 P2 1 NaN NaN
Upvotes: 1