Reputation: 4983
I have been struggling to find an elegant way of looking up multiple values from a pandas DataFrame. Assume we have a dataframe df
that holds the “result” R
, that depends on multiple index keys, and we have another dataframe keys
where each row is a list of values to look up from df
. The problem is to loop over the keys and look up the corresponding value from df
. If the value does not exist in df
, I expect to get a np.nan
.
So far I have come up with three different methods, but I feel that all of them lack elegance. So my question is there another prettier method for multiple lookups? Note that the three methods below all give the same result.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':range(5),
'B':range(10,15),
'C':range(100,105),
'R':np.random.rand(5)}).set_index(['A','B','C'])
print 'df'
print df
keys = pd.DataFrame({'A':[0,0,5],'B':[10,10,10],'C':[100,100,100]})
print '--'
print 'keys'
print keys
# By merge
print '--'
print pd.merge(df.reset_index(), keys, on=['A','B','C'],how='right').reset_index().R
# By reindex
print '--'
print df.reindex(keys.set_index(['A','B','C']).index).reset_index().R
# By apply
print '--'
print keys.apply(lambda s : df.R.get((s.A,s.B,s.C)),axis=1).to_frame('R').R
Upvotes: 0
Views: 681
Reputation: 4983
I found an even simpler solution:
keys = (pd.DataFrame({'A':[0,0,5],'B':[10,10,10],'C':[100,100,100]})
.set_index(['A','B','C']))
keys['R'] = df
or similarly (and more chaining compatible):
keys.assign(R = df)
That's all that is needed. The automatic alignment of the index does the rest of the work! :-)
Upvotes: 0
Reputation: 5372
I think update
is pretty.
result = keys.set_index( ['A','B','C']) # looks like R
result['R'] = pd.np.nan # add nan
Them use update
result.update(df)
R
A B C
0 10 100 0.068085
100 0.068085
5 10 100 NaN
Upvotes: 1