Alon G
Alon G

Reputation: 35

Pandas dataframe delete duplicate base date column

I have 2 datafames with same columns that one of the column is date.

I try to concat the dataframes and delete the row with the earlier date, when the primary keys are same.

Input (df1 & df2):

pk1 | pk2 |  C  |   DATE  
 1  |  2  |  3  | 05-09-22
 2  |  3  |  4  | 05-09-22


pk1 | pk2 |  C  |   DATE  
 1  |  2  |  5  | 06-09-22

Output:

pk1 | pk2 |  C  |   DATE  
 2  |  3  |  4  | 05-09-22
 1  |  2  |  5  | 06-09-22

Upvotes: 2

Views: 51

Answers (1)

gtomer
gtomer

Reputation: 6564

You need to drop_duplicates while keeping the first.

df = pd.concat([df1,df2]) # concating
df.sort_values(by=['DATE'], ascending=True, inplace=True) # sorting by date
df = df.drop_duplicates(subset=['pk1', pk2], keep='first') # dropping duplicates

Upvotes: 1

Related Questions