Reputation: 168
I have a two pandas dataframe with several rows that are near duplicates of each other, except for one value, which is timestamp value. My goal is to merge these dataframes into a single dataframe, and for these nearly repeat rows, get the row with the last timestamp.
Here is an example of what I'm working with:
DF1:
id name created_at
0 1 Cristiano Ronaldo 2020-01-20
1 2 Messi 2020-01-20
2 3 Juarez 2020-01-20
DF2:
id name created_at
0 1 Cristiano Ronaldo 2020-01-20
1 2 Messi 2020-01-20
2 3 Juarez 2020-02-20
And here is what I would like:
id name created_at
3 1 Cristiano Ronaldo 2020-01-20
4 2 Messi 2020-01-20
5 3 Juarez 2020-02-20
For the row Juarez I get the last "created_ad"
Tha is it possible?
Upvotes: 2
Views: 338
Reputation: 2442
You can append the second dataframe to the first one, sort the dataframe using timestamp and then drop duplicates.
df_merged = df1.append(df2, ignore_index = True)
df_merged = df_merged.sort_values('created_at')
df_columns = df_merged.columns.tolist()
df_columns.remove('created_at')
df_merged.drop_duplicates(inplace = True, keep = 'last', subset = df_columns)
Upvotes: 1