Reputation: 1192
I have a dataframe d
with about 100,000,000
rows and 3
columns. It looks something like this:
import pandas as pd
In [17]: d = pd.DataFrame({'id': ['a', 'b', 'c', 'd', 'e'], 'val': [1, 2, 3, 4, 5], 'n': [34, 22, 95, 86, 44]})
In [18]: d.set_index(['id', 'val'], inplace = True)
I have another dataframe with values of id
and val
that I want to keep in d
. There are around 600,000 combinations of id
and val
that I want to keep:
In [20]: keep = pd.DataFrame({'id':['a', 'b'], 'val' : [1, 2]})
I have tried this in the following way:
In [21]: keep.set_index(['id', 'val'], inplace = True)
In [22]: d.loc[d.index.isin(keep.index), :]
Out [22]:
n
id val
a 1 34
b 2 22
This works but seems clunky and is very slow. Is there a better approach here? What is the fastest way to slice on Multindex in pandas?
Upvotes: 4
Views: 151
Reputation: 403050
Use loc
to index d
using keep
:
# d.loc[pd.MultiIndex.from_arrays([keep['id'], keep['val']]))]
d.reindex(pd.MultiIndex.from_arrays([keep['id'], keep['val']])))
n
id val
a 1 34
b 2 22
Where,
pd.MultiIndex.from_arrays([keep['id'], keep['val']])
MultiIndex(levels=[['a', 'b'], [1, 2]],
codes=[[0, 1], [0, 1]],
names=['id', 'val'])
Generates a MultiIndex from keep
that is used to filter.
Another option is to try merging with df.merge
:
d.merge(keep, left_index=True, right_on=['id', 'val'])
n id val
0 34 a 1
1 22 b 2
d.merge(keep, left_index=True, right_on=['id', 'val']).set_index(['n', 'id'])
val
n id
34 a 1
22 b 2
Or even,
d.join(keep.set_index(['id', 'val']), how='inner')
n
id val
a 1 34
b 2 22
Upvotes: 3
Reputation: 323366
Using reindex
d.reindex(pd.MultiIndex.from_frame(keep))
Out[151]:
n
id val
a 1 34
b 2 22
Upvotes: 5
Reputation: 294506
loc
takes a list of tuples for referencing a MultiIndex
d.loc[[*keep.itertuples(index=False)]]
n
id val
a 1 34
b 2 22
A more obnoxious way to do the same thing. (Not actually a recommendation)
d.loc[[*zip(*map(keep.get, keep))]]
n
id val
a 1 34
b 2 22
The advantage is 3 less characters. You only have to sacrifice understanding what is going on.
Upvotes: 5