Reputation: 443
My dataset look like this
Tr, Date, Time
AV81312,20161014,121000
AV81312,20161014,160221
AV85012,20170422,150858
AV85012,20161108,11137
AV86157,20170426,45747
AV86157,20170426,45744
AV86157,20160813,134312
I need to select only one item from each Tr
having latest record i.e latest record having date and and time higher
Required output is
Tr, Date, Time
AV81312,20161014,160221
AV85012,20170422,150858
AV86157,20170426,45747
My code is
df2 = read_csv("sample.csv")
df2 = df2.values
x = []
for i in df2:
for j in df2:
if i[2] == j[2]:
if i[3] >= j[3]:
x.append(i)
It wasn't working as expected
Upvotes: 1
Views: 33
Reputation: 9081
Use -
df['Date_Time'] = pd.to_datetime(df['Date'].astype(str).str.cat(df['Time'].astype(str)),format='%Y%m%d%H%M%S')
df.loc[df.groupby('Tr')['Date_Time'].idxmax()].drop('Date_Time', axis=1)
Output
Tr Date Time
0 AV81312 20161014 160221
1 AV85012 20170422 150858
2 AV86157 20170426 45747
Upvotes: 1