Reputation: 5107
I have two data-frames. I am trying to update one data-frame with another.
df1
looks like:
col2 col3 dept
date
2020-05-08 29 21 A
2020-05-09 56 12 B
2020-05-10 82 15 C
2020-05-11 13 9 D
2020-05-12 35 13 E
2020-05-13 53 87 F
2020-05-14 25 9 G
2020-05-15 23 63 H
and df2
looks like:
col2 dept
date
2020-05-08 64 A
2020-05-09 41 B
2020-05-10 95 C
2020-05-11 58 D
2020-05-12 89 E
2020-05-13 37 F
2020-05-14 24 G
2020-05-15 67 H
I can update df1
with df2
no problem when the dates in the index of both data-frames match using:
df1 = df1.set_index('dept', append=True)
df2 = df2.set_index('dept', append=True)
df1.update(df2)
df1 = df1.reset_index(level='dept')
df2 = df2.reset_index(level='dept')
The above brings dept into the index and I update df1
with df2
So col2
in df1
is updated with the output being:
dept col2 col3
date
2020-05-07 A 64 21
2020-05-08 B 41 12
2020-05-09 C 95 15
2020-05-10 D 58 9
2020-05-11 E 89 13
2020-05-12 F 37 87
2020-05-13 G 24 9
2020-05-14 H 67 63
However - when the index dates are different for example when df2 looks like:
col2 dept
date
2020-05-07 64 A
2020-05-08 41 B
2020-05-09 95 C
2020-05-10 58 D
2020-05-11 89 E
2020-05-12 37 F
2020-05-13 24 G
2020-05-14 67 H
Using the above code df1
does not update with the values from df2
. How can I update a date-frame with another data-frame when the index is different? A solution is to make the dates in df2
the same as df1
then that will be fine (but how would you do that).
Upvotes: 1
Views: 636
Reputation: 34056
You want to update df1
from df2
on a combination of indexes(date
and dept
).
Your same code would have worked fine, only if you would have had matching indexes across dataframes.
The indexes for df1 after df1 = df1.set_index('dept', append=True)
are:
In [160]: df1.index.tolist()
Out[160]:
[('2020-05-08', 'A'),
('2020-05-09', 'B'),
('2020-05-10', 'C'),
('2020-05-11', 'D'),
('2020-05-12', 'E'),
('2020-05-13', 'F'),
('2020-05-14', 'G'),
('2020-05-15', 'H')]
Taking your second df2
for example. Index for df2
after df2 = df2.set_index('dept', append=True)
are:
In [159]: df2.index.tolist()
Out[159]:
[('2020-05-07', 'A'),
('2020-05-08', 'B'),
('2020-05-09', 'C'),
('2020-05-10', 'D'),
('2020-05-11', 'E'),
('2020-05-12', 'F'),
('2020-05-13', 'G'),
('2020-05-14', 'H')]
You can see that indexes of both dataframes do not match for any row. Hence, it did not update. If you would have had matching indexes, you would get desired output from your code itself.
Upvotes: 3