selecting rows from dataframe based on another series index

Question

I have a DataFrame, df, with MultiIndex of (id1, id2, date) and a bunch of columns. I have a Series, s, that has (id1, id2) as index and date as the values. I would like to select all the rows in df where (id1, id2, date) match the corresponding values in s. What is the best way to do this?

I thought I could reset index on s and df and then use the columns as the join columns doing a right outer merge. This isn't great as it requires resetting index which is hashed so the merge is faster using that than the columns. It doesn't seem possible to have a Series with no values (and just have index be (id1, id2, date)) which would have been ideal in this case. Any other ideas?

jezrael · Accepted Answer

You can merge by columns name and index in pandas 0.23.0+:

print (df.merge(s.to_frame(), on=['id1','id2','date']))

Sample:

df = pd.DataFrame({'date':pd.date_range('2015-01-01', periods=6),
                   'id1':[4,5,4,5,5,4],
                   'id2':[7,8,9,4,2,3],
                   'F':list('aaabbb')}).set_index(['id1','id2','date'])

print (df)
                    F
id1 id2 date         
4   7   2015-01-01  a
5   8   2015-01-02  a
4   9   2015-01-03  a
5   4   2015-01-04  b
    2   2015-01-05  b
4   3   2015-01-06  b

s = pd.DataFrame({'date':pd.date_range('2015-01-01', periods=3), 
                  'id1':[4,5,0], 
                  'id2':[7,8,2]}).set_index(['id1','id2'])['date']
print (s)
id1  id2
4    7     2015-01-01
5    8     2015-01-02
0    2     2015-01-03
Name: date, dtype: datetime64[ns]

df1 = df.merge(s.to_frame(), on=['id1','id2','date'])
print (df1)
              date  F
id1 id2              
4   7   2015-01-01  a
5   8   2015-01-02  a

Another solution:

df1 = df.reset_index().merge(s.reset_index(), on=['id1','id2','date'])
print (df1)
   id1  id2       date  F
0    4    7 2015-01-01  a
1    5    8 2015-01-02  a

Solution with reindex is possible, but need intersection of MultiIndexes:

s_index = s.to_frame().assign(tmp=1).set_index('date', append=True).index
idx = df.index.intersection(s_index)
df1 = df.reindex(idx)
print (df1)

                    F
id1 id2 date         
4   7   2015-01-01  a
5   8   2015-01-02  a

selecting rows from dataframe based on another series index

Answers (1)

Related Questions