Reputation: 414
I have a data frame with date index. There are a few dates that somehow went missing. This I’ll call dataframe A
. I have another data frame with the dates in question included. I’ll call this dataframe B
.
I’d like to merge two dataframes:
Keep all indices of A
and join it with B
, but I don’t want any of the rows in B
that share an index with A
. That is, I want only the rows missing from A
returned from B
.
How is this most easily achieved?
Note:
This behavior is true for a database of data I have. I’ll be doing it roughly 400 times.
Upvotes: 2
Views: 764
Reputation: 1286
Although there alread good anwer I want to share this one because it's so short
pd.concat([A, B]).drop_duplicates(keep='first')
Upvotes: 2
Reputation: 862441
I beleive you need Index.difference
:
B.loc[B.index.difference(A.index)]
EDIT:
A = pd.DataFrame({'A':range(10)}, index=pd.date_range('2019-02-01', periods=10))
B = pd.DataFrame({'A':range(10, 20)}, index=pd.date_range('2019-01-27', periods=10))
df = pd.concat([A, B.loc[B.index.difference(A.index)]]).sort_index()
print (df)
A
2019-01-27 10
2019-01-28 11
2019-01-29 12
2019-01-30 13
2019-01-31 14
2019-02-01 0
2019-02-02 1
2019-02-03 2
2019-02-04 3
2019-02-05 4
2019-02-06 5
2019-02-07 6
2019-02-08 7
2019-02-09 8
2019-02-10 9
df1= pd.concat([A, B])
df1 = df1[~df1.index.duplicated()].sort_index()
print (df1)
A
2019-01-27 10
2019-01-28 11
2019-01-29 12
2019-01-30 13
2019-01-31 14
2019-02-01 0
2019-02-02 1
2019-02-03 2
2019-02-04 3
2019-02-05 4
2019-02-06 5
2019-02-07 6
2019-02-08 7
2019-02-09 8
2019-02-10 9
Upvotes: 2
Reputation: 18201
If I'm reading the question correctly, what you want is
B[~B.index.isin(A.index)]
For example:
In [192]: A
Out[192]:
Empty DataFrame
Columns: []
Index: [1, 2, 4, 5]
In [193]: B
Out[193]:
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5]
In [194]: B[~B.index.isin(A.index)]
Out[194]:
Empty DataFrame
Columns: []
Index: [3]
To use the data from A
when it's there, and otherwise take it from B
, you could then do
pd.concat([A, B[~B.index.isin(A.index)]).sort_index()
or, assuming that A
contains no null elements that you want to keep, you could take a different approach and go for something like
pd.DataFrame(A, index=B.index).fillna(B)
Upvotes: 3