FunnyCoder
FunnyCoder

Reputation: 443

Group a dataframe and select one cell among them based on a condition

My dataset look like this Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312

I need to select only one item from each Tr having latest record i.e latest record having date and and time higher

Required output is Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747

My code is

df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i) It wasn't working as expected

Upvotes: 1

Views: 33

Answers (1)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

Use -

df['Date_Time'] = pd.to_datetime(df['Date'].astype(str).str.cat(df['Time'].astype(str)),format='%Y%m%d%H%M%S')
df.loc[df.groupby('Tr')['Date_Time'].idxmax()].drop('Date_Time', axis=1)

Output

        Tr      Date    Time
0  AV81312  20161014  160221
1  AV85012  20170422  150858
2  AV86157  20170426   45747

Upvotes: 1

Related Questions