Reputation: 3177
I have 2 dataframes like this
df = pd.DataFrame({"date":["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"],
"A": [1., 2., 3., 4.],
"B": ["a", "b", "c", "d"]})
df["date"] = pd.to_datetime(df["date"])
df_new = pd.DataFrame({"date":["2019-01-02", "2019-01-03", "2019-01-04", "2019-01-05", "2019-01-06"],
"A": [2, 3.5, 4, 5., 6.],
"B": ["b", "c1", "d", "e", "f"]})
df_new["date"] = pd.to_datetime(df_new["date"])
So, my dataframes look like this
df
-----------------------
date A B
2019-01-01 1 a
2019-01-02 2 b
2019-01-03 3 c
2019-01-04 4 d
df_new
----------------------
date A B
2019-01-02 2 b
2019-01-03 3.5 c1
2019-01-04 4 d
2019-01-05 5 e
2019-01-06 6 f
From these dataframes, I would like to append df to df_new with specific condition as follows:
Any row with date available in both dataframe, we take such rows in df_new
Any row with date available in df but not in df_new, we take such rows in df
Finally my expected output look like this
Expected output
----------------------
date A B
2019-01-01 1 a (take from df)
2019-01-02 2 b (take from df_new)
2019-01-03 3.5 c1 (take from df_new)
2019-01-04 4 d (take from df_new)
2019-01-05 5 e (take from df_new)
2019-01-06 6 f (take from df_new)
I can think of finding the row difference between 2 dataframes but it does not work when I take the date column into account. May I have your suggestions? Thank you.
Upvotes: 2
Views: 2399
Reputation: 13401
You can use pandas.DataFrame.merge
with outer
and then use drop_duplicates
to remove duplicate rows.
df1 = df_new.merge(df, how='outer', on=['date','A','B']).sort_values(by='date').drop_duplicates('date').reset_index(drop=True)
print(df1)
Output:
A B date
0 1.0 a 2019-01-01
1 2.0 b 2019-01-02
2 3.5 c1 2019-01-03
3 4.0 d 2019-01-04
4 5.0 e 2019-01-05
5 6.0 f 2019-01-06
Upvotes: 0
Reputation: 862481
Use concat
and remove duplicates by date
column by DataFrame.drop_duplicates
, last create default uniqe index values by DataFrame.reset_index
:
df = pd.concat([df, df_new]).drop_duplicates('date', keep='last').reset_index(drop=True)
print (df)
date A B
0 2019-01-01 1.0 a
1 2019-01-02 2.0 b
2 2019-01-03 3.5 c1
3 2019-01-04 4.0 d
4 2019-01-05 5.0 e
5 2019-01-06 6.0 f
Upvotes: 3