Hanzy
Hanzy

Reputation: 414

Pandas join (merge?) dataframes, keep only unique indicies

I have a data frame with date index. There are a few dates that somehow went missing. This I’ll call dataframe A. I have another data frame with the dates in question included. I’ll call this dataframe B.

I’d like to merge two dataframes:

Keep all indices of A and join it with B, but I don’t want any of the rows in B that share an index with A. That is, I want only the rows missing from A returned from B.

How is this most easily achieved?

Note:

This behavior is true for a database of data I have. I’ll be doing it roughly 400 times.

Upvotes: 2

Views: 764

Answers (3)

JoergVanAken
JoergVanAken

Reputation: 1286

Although there alread good anwer I want to share this one because it's so short

pd.concat([A, B]).drop_duplicates(keep='first')

Upvotes: 2

jezrael
jezrael

Reputation: 862441

I beleive you need Index.difference:

B.loc[B.index.difference(A.index)]

EDIT:

A = pd.DataFrame({'A':range(10)}, index=pd.date_range('2019-02-01', periods=10))
B = pd.DataFrame({'A':range(10, 20)}, index=pd.date_range('2019-01-27', periods=10))

df = pd.concat([A, B.loc[B.index.difference(A.index)]]).sort_index()
print (df)
             A
2019-01-27  10
2019-01-28  11
2019-01-29  12
2019-01-30  13
2019-01-31  14
2019-02-01   0
2019-02-02   1
2019-02-03   2
2019-02-04   3
2019-02-05   4
2019-02-06   5
2019-02-07   6
2019-02-08   7
2019-02-09   8
2019-02-10   9

df1= pd.concat([A, B])
df1 = df1[~df1.index.duplicated()].sort_index()
print (df1)
             A
2019-01-27  10
2019-01-28  11
2019-01-29  12
2019-01-30  13
2019-01-31  14
2019-02-01   0
2019-02-02   1
2019-02-03   2
2019-02-04   3
2019-02-05   4
2019-02-06   5
2019-02-07   6
2019-02-08   7
2019-02-09   8
2019-02-10   9

Upvotes: 2

fuglede
fuglede

Reputation: 18201

If I'm reading the question correctly, what you want is

B[~B.index.isin(A.index)]

For example:

In [192]: A
Out[192]:
Empty DataFrame
Columns: []
Index: [1, 2, 4, 5]

In [193]: B
Out[193]:
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5]

In [194]: B[~B.index.isin(A.index)]
Out[194]:
Empty DataFrame
Columns: []
Index: [3]

To use the data from A when it's there, and otherwise take it from B, you could then do

pd.concat([A, B[~B.index.isin(A.index)]).sort_index()

or, assuming that A contains no null elements that you want to keep, you could take a different approach and go for something like

pd.DataFrame(A, index=B.index).fillna(B)

Upvotes: 3

Related Questions