Looking up multiple values from a pandas DataFrame

Question

I have been struggling to find an elegant way of looking up multiple values from a pandas DataFrame. Assume we have a dataframe df that holds the “result” R, that depends on multiple index keys, and we have another dataframe keys where each row is a list of values to look up from df. The problem is to loop over the keys and look up the corresponding value from df. If the value does not exist in df, I expect to get a np.nan.

So far I have come up with three different methods, but I feel that all of them lack elegance. So my question is there another prettier method for multiple lookups? Note that the three methods below all give the same result.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':range(5),
                   'B':range(10,15),
                   'C':range(100,105),
                   'R':np.random.rand(5)}).set_index(['A','B','C'])

print 'df'
print df

keys = pd.DataFrame({'A':[0,0,5],'B':[10,10,10],'C':[100,100,100]})
print '--'
print 'keys'
print keys

# By merge 
print '--'
print pd.merge(df.reset_index(), keys, on=['A','B','C'],how='right').reset_index().R

# By reindex
print '--'
print df.reindex(keys.set_index(['A','B','C']).index).reset_index().R

# By apply
print '--'
print keys.apply(lambda s : df.R.get((s.A,s.B,s.C)),axis=1).to_frame('R').R

dermen · Accepted Answer

I think update is pretty.

result = keys.set_index( ['A','B','C']) # looks like R
result['R'] = pd.np.nan # add nan

Them use update

result.update(df)
                 R
A B  C            
0 10 100  0.068085
     100  0.068085
5 10 100       NaN

Looking up multiple values from a pandas DataFrame

Answers (2)

Related Questions