Reputation: 2325
I have two Pandas series (d1 and d2) indexed by datetime and each containing one column of data with both float and NaN. Both indices are at one-day intervals, although the time entries are inconsistent with many periods of missing days. d1 ranges from 1974-12-16 to 2002-01-30. d2 ranges from 1997-12-19 to 2017-07-06. The period from 1997-12-19 to 2002-01-30 contains many duplicate indices between the two series. The data for duplicated indices is sometimes the same value, different values, or one value and NaN.
I would like to combine these two series into one, prioritizing the data from d2 anytime there are duplicate indices (that is, replace all d1 data with d2 data anytime there is a duplicated index). What is the most efficient way to do this among the many Pandas tools available (merge, join, concatenate etc.)?
Here is an example of my data:
In [7]: print d1
fldDate
1974-12-16 19.0
1974-12-17 28.0
1974-12-18 24.0
1974-12-19 18.0
1974-12-20 17.0
1974-12-21 28.0
1974-12-22 28.0
1974-12-23 10.0
1974-12-24 6.0
1974-12-25 5.0
1974-12-26 12.0
1974-12-27 19.0
1974-12-28 22.0
1974-12-29 20.0
1974-12-30 16.0
1974-12-31 12.0
1975-01-01 12.0
1975-01-02 15.0
1975-01-03 14.0
1975-01-04 15.0
1975-01-05 18.0
1975-01-06 21.0
1975-01-07 22.0
1975-01-08 18.0
1975-01-09 20.0
1975-01-10 12.0
1975-01-11 8.0
1975-01-12 -2.0
1975-01-13 13.0
1975-01-14 24.0
...
2002-01-01 18.0
2002-01-02 16.0
2002-01-03 NaN
2002-01-04 24.0
2002-01-05 23.0
2002-01-06 15.0
2002-01-07 22.0
2002-01-08 34.0
2002-01-09 35.0
2002-01-10 29.0
2002-01-11 21.0
2002-01-12 24.0
2002-01-13 NaN
2002-01-14 18.0
2002-01-15 14.0
2002-01-16 10.0
2002-01-17 5.0
2002-01-18 7.0
2002-01-19 7.0
2002-01-20 7.0
2002-01-21 11.0
2002-01-22 NaN
2002-01-23 9.0
2002-01-24 8.0
2002-01-25 15.0
2002-01-26 NaN
2002-01-27 NaN
2002-01-28 18.0
2002-01-29 13.0
2002-01-30 13.0
Name: MaxTempMid, dtype: float64
In [8]: print d2
fldDate
1997-12-19 22.0
1997-12-20 14.0
1997-12-21 18.0
1997-12-22 16.0
1997-12-23 16.0
1997-12-24 10.0
1997-12-25 12.0
1997-12-26 12.0
1997-12-27 9.0
1997-12-28 12.0
1997-12-29 18.0
1997-12-30 23.0
1997-12-31 28.0
1998-01-01 26.0
1998-01-02 29.0
1998-01-03 27.0
1998-01-04 22.0
1998-01-05 19.0
1998-01-06 17.0
1998-01-07 14.0
1998-01-08 14.0
1998-01-09 14.0
1998-01-10 16.0
1998-01-11 20.0
1998-01-12 21.0
1998-01-13 19.0
1998-01-14 20.0
1998-01-15 16.0
1998-01-16 17.0
1998-01-17 20.0
...
2017-06-07 68.0
2017-06-08 71.0
2017-06-09 71.0
2017-06-10 59.0
2017-06-11 41.0
2017-06-12 57.0
2017-06-13 58.0
2017-06-14 36.0
2017-06-15 50.0
2017-06-16 58.0
2017-06-17 54.0
2017-06-18 53.0
2017-06-19 58.0
2017-06-20 68.0
2017-06-21 71.0
2017-06-22 71.0
2017-06-23 59.0
2017-06-24 61.0
2017-06-25 65.0
2017-06-26 68.0
2017-06-27 71.0
2017-06-28 60.0
2017-06-29 54.0
2017-06-30 48.0
2017-07-01 60.0
2017-07-02 68.0
2017-07-03 65.0
2017-07-04 73.0
2017-07-05 74.0
2017-07-06 77.0
Name: MaxTempMid, dtype: float64
Upvotes: 2
Views: 892
Reputation: 153460
Let's use, combine_first
:
df2.combine_first(df1)
Output:
fldDate
1974-12-16 19.0
1974-12-17 28.0
1974-12-18 24.0
1974-12-19 18.0
1974-12-20 17.0
1974-12-21 28.0
1974-12-22 28.0
1974-12-23 10.0
1974-12-24 6.0
1974-12-25 5.0
1974-12-26 12.0
1974-12-27 19.0
1974-12-28 22.0
1974-12-29 20.0
1974-12-30 16.0
1974-12-31 12.0
1975-01-01 12.0
1975-01-02 15.0
1975-01-03 14.0
1975-01-04 15.0
1975-01-05 18.0
1975-01-06 21.0
1975-01-07 22.0
1975-01-08 18.0
1975-01-09 20.0
1975-01-10 12.0
1975-01-11 8.0
1975-01-12 -2.0
1975-01-13 13.0
1975-01-14 24.0
...
2017-06-07 68.0
2017-06-08 71.0
2017-06-09 71.0
2017-06-10 59.0
2017-06-11 41.0
2017-06-12 57.0
2017-06-13 58.0
2017-06-14 36.0
2017-06-15 50.0
2017-06-16 58.0
2017-06-17 54.0
2017-06-18 53.0
2017-06-19 58.0
2017-06-20 68.0
2017-06-21 71.0
2017-06-22 71.0
2017-06-23 59.0
2017-06-24 61.0
2017-06-25 65.0
2017-06-26 68.0
2017-06-27 71.0
2017-06-28 60.0
2017-06-29 54.0
2017-06-30 48.0
2017-07-01 60.0
2017-07-02 68.0
2017-07-03 65.0
2017-07-04 73.0
2017-07-05 74.0
2017-07-06 77.0
Upvotes: 1