john smith
john smith

Reputation: 517

Replace data from one pandas dataframe to another

I have two dataframes df1 and df2 . They both contain time-series data, so it is possible some of the dates in df1 and df2 intersect with each other and the rest don't. My requirement is an operation on the two dataframes that replaces the values in df1 with the values in df2 for the same dates, leaves alone values for indexes in df1 not present in df2 and adds the values for indexes present in df2 and not in df1. Consider the following example:

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   E
1   A4  B4  C4  E4
2   A5  B5  C5  E5
3   A6  B6  C6  E6
4   A7  B7  C7  E7

result df:
    A   B   C   D   E
0   A0  BO  C0  D0  Nan
1   A4  B4  C4  D4  E4
2   A5  B5  C5  D5  E5
3   A6  B6  C6  D6  E6
4   A7  B7  C7  D7  E7

I tried to develop the logic with the first step concatenating the two dfs but that leads to rows with duplicate indexes and am not sure how to handle that. How can this be achieved? Any suggestions would help

Edit: A simpler case would be when the column names are same in the two dataframes. So consider df2 has column D instead of E with values D4,D5,D6,D7.

A concatenation yields the following result:

concat(df1,df2,axis=1)
    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0  NaN  NaN  NaN  NaN  
1   A1   B1   C1   D1   A4   B4   C4   D4
2   A2   B2   C2   D2   A5   B5   C5   D5
3   A3   B3   C3   D3   A6   B6   C6   D6
4  NaN  NaN  NaN  NaN   A7   B7   C7   D7

Now this introduces duplicate columns. A conventional solution would be to loop through each column but I am looking for a more elegant solution. Any ideas would be appreciated.

Upvotes: 6

Views: 7018

Answers (2)

embeepea
embeepea

Reputation: 667

I just saw this question and realized it is almost identical to one that I just asked today and that @Alexander (the poster of the answer above) answered very nicely:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

See pandas DataFrame concat / update ("upsert")? for the discussion.

Upvotes: 2

Alexander
Alexander

Reputation: 109686

update will align on the indices of both DataFrames:

df1.update(df2)

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   D
1   A4  B4  C4  D4
2   A5  B5  C5  D5
3   A6  B6  C6  D6
4   A7  B7  C7  D7

>>> df1.update(df2)
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6

You then need to add the values in df2 not present in df1:

>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]: 
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6
4  A7  B7  C7  D7

Upvotes: 8

Related Questions