kostya ivanov
kostya ivanov

Reputation: 707

Concatenating two dataframes with one common column and replacing the values of the column with the values of another

I have two data sets. I need to concatenate them and replace the values of one column in the dataframe with the values of another column in the other dataframe.

Example data:

df1

date        col1   col2   col3
2020-03-27  NaN    0      NaN
2020-03-28  1      NaN    NaN
2020-03-29  2      4      2
2020-03-30  3      3      NaN
2020-03-31  3      2      NaN
2020-04-01  NaN    2      NaN
df2

date        col3
2020-03-27  NaN
2020-03-28  1
2020-03-29  2
2020-03-30  3
2020-03-31  NaN

As a result of the union, I want to get the following result:

df_res

date        col1   col2   col3
2020-03-27  NaN    0      NaN
2020-03-28  1      NaN    1
2020-03-29  2      4      2
2020-03-30  3      3      3
2020-03-31  3      2      NaN
2020-04-01  NaN    2      NaN

I tried pandas update but it didn't work for me.

Worked best: df_res = pd.merge(df1, df2, how='left', on='date'), but it creates stores two versions of the columns:

df_res

date        col1   col2   col3_x   col3_y
2020-03-27  NaN    0      NaN      NaN
2020-03-28  1      NaN    NaN      1
2020-03-29  2      4      2        2
2020-03-30  3      3      NaN      3
2020-03-31  3      2      NaN      NaN
2020-04-01  NaN    2      NaN      NaN

How can this merger be done?

Upvotes: 0

Views: 27

Answers (3)

PaulS
PaulS

Reputation: 25333

Another possible solution, which is based on the idea of first using date as index in both dataframes and then assigning col3 of df2 to col3 of df1:

out = df1.set_index('date')
df2 = df2.set_index('date')

out['col3'] = df2['col3']
out = out.reset_index()

Output:

         date  col1  col2  col3
0  2020-03-27   NaN   0.0   NaN
1  2020-03-28   1.0   NaN   1.0
2  2020-03-29   2.0   4.0   2.0
3  2020-03-30   3.0   3.0   3.0
4  2020-03-31   3.0   2.0   NaN
5  2020-04-01   NaN   2.0   NaN

Upvotes: 0

BeRT2me
BeRT2me

Reputation: 13242

You can use pd.DataFrame.combine_first

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame.

Doing:

df1 = df1.set_index('date')
df2 = df2.set_index('date')

df = df1.combine_first(df2)
print(df)

Output:

            col1  col2  col3
date
2020-03-27   NaN   0.0   NaN
2020-03-28   1.0   NaN   1.0
2020-03-29   2.0   4.0   2.0
2020-03-30   3.0   3.0   3.0
2020-03-31   3.0   2.0   NaN
2020-04-01   NaN   2.0   NaN

Upvotes: 2

Sachin Kohli
Sachin Kohli

Reputation: 1986

After merging in df_res, you can do this maybe;

df_res["col3"] = df_res.apply(lambda x: x["col3_x"] if pd.isna(x["col3_y"]) else x["col3_x"],axis=1)

Upvotes: 0

Related Questions