Dov Grobgeld
Dov Grobgeld

Reputation: 4983

Looking up multiple values from a pandas DataFrame

I have been struggling to find an elegant way of looking up multiple values from a pandas DataFrame. Assume we have a dataframe df that holds the “result” R, that depends on multiple index keys, and we have another dataframe keys where each row is a list of values to look up from df. The problem is to loop over the keys and look up the corresponding value from df. If the value does not exist in df, I expect to get a np.nan.

So far I have come up with three different methods, but I feel that all of them lack elegance. So my question is there another prettier method for multiple lookups? Note that the three methods below all give the same result.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':range(5),
                   'B':range(10,15),
                   'C':range(100,105),
                   'R':np.random.rand(5)}).set_index(['A','B','C'])

print 'df'
print df

keys = pd.DataFrame({'A':[0,0,5],'B':[10,10,10],'C':[100,100,100]})
print '--'
print 'keys'
print keys

# By merge 
print '--'
print pd.merge(df.reset_index(), keys, on=['A','B','C'],how='right').reset_index().R

# By reindex
print '--'
print df.reindex(keys.set_index(['A','B','C']).index).reset_index().R

# By apply
print '--'
print keys.apply(lambda s : df.R.get((s.A,s.B,s.C)),axis=1).to_frame('R').R

Upvotes: 0

Views: 681

Answers (2)

Dov Grobgeld
Dov Grobgeld

Reputation: 4983

I found an even simpler solution:

keys = (pd.DataFrame({'A':[0,0,5],'B':[10,10,10],'C':[100,100,100]})
        .set_index(['A','B','C']))
keys['R'] = df

or similarly (and more chaining compatible):

keys.assign(R = df)

That's all that is needed. The automatic alignment of the index does the rest of the work! :-)

Upvotes: 0

dermen
dermen

Reputation: 5372

I think update is pretty.

result = keys.set_index( ['A','B','C']) # looks like R
result['R'] = pd.np.nan # add nan

Them use update

result.update(df)
                 R
A B  C            
0 10 100  0.068085
     100  0.068085
5 10 100       NaN

Upvotes: 1

Related Questions