Reputation: 19803
I have a DataFrame
, df, with MultiIndex
of (id1, id2, date)
and a bunch of columns. I have a Series
, s, that has (id1, id2)
as index and date
as the values. I would like to select all the rows in df where (id1, id2, date)
match the corresponding values in s. What is the best way to do this?
I thought I could reset index on s
and df
and then use the columns as the join columns doing a right outer merge
. This isn't great as it requires resetting index which is hashed so the merge is faster using that than the columns. It doesn't seem possible to have a Series
with no values (and just have index be (id1, id2, date)
) which would have been ideal in this case. Any other ideas?
Upvotes: 2
Views: 330
Reputation: 862521
You can merge
by columns name and index in pandas 0.23.0+:
print (df.merge(s.to_frame(), on=['id1','id2','date']))
Sample:
df = pd.DataFrame({'date':pd.date_range('2015-01-01', periods=6),
'id1':[4,5,4,5,5,4],
'id2':[7,8,9,4,2,3],
'F':list('aaabbb')}).set_index(['id1','id2','date'])
print (df)
F
id1 id2 date
4 7 2015-01-01 a
5 8 2015-01-02 a
4 9 2015-01-03 a
5 4 2015-01-04 b
2 2015-01-05 b
4 3 2015-01-06 b
s = pd.DataFrame({'date':pd.date_range('2015-01-01', periods=3),
'id1':[4,5,0],
'id2':[7,8,2]}).set_index(['id1','id2'])['date']
print (s)
id1 id2
4 7 2015-01-01
5 8 2015-01-02
0 2 2015-01-03
Name: date, dtype: datetime64[ns]
df1 = df.merge(s.to_frame(), on=['id1','id2','date'])
print (df1)
date F
id1 id2
4 7 2015-01-01 a
5 8 2015-01-02 a
Another solution:
df1 = df.reset_index().merge(s.reset_index(), on=['id1','id2','date'])
print (df1)
id1 id2 date F
0 4 7 2015-01-01 a
1 5 8 2015-01-02 a
Solution with reindex
is possible, but need intersection of MultiIndex
es:
s_index = s.to_frame().assign(tmp=1).set_index('date', append=True).index
idx = df.index.intersection(s_index)
df1 = df.reindex(idx)
print (df1)
F
id1 id2 date
4 7 2015-01-01 a
5 8 2015-01-02 a
Upvotes: 2