lookup first match in Pandas dataframe

Question

I need to use a DataFrame as a lookup table on columns that are not part of the index. For example (this is a simple one just to illustrate):

import pandas as pd

westcoast = pd.DataFrame([['Washington','Olympia'],['Oregon','Salem'],
                          ['California','Sacramento']],
                        columns=['state','capital'])
print westcoast

        state     capital
0  Washington     Olympia
1      Oregon       Salem
2  California  Sacramento

It's easy to lookup and get a Series as an output:

westcoast[westcoast.state=='Oregon'].capital

1    Salem
Name: capital, dtype: object

but I want to obtain the string 'Salem':

westcoast[westcoast.state=='Oregon'].capital.values[0]

'Salem'

and the .values[0] seems somewhat clunky... is there a better way?

(FWIW: my real data has maybe 50 rows at most, but lots of columns, so if I do set an index column, no matter what column I choose, there will be a lookup operation like this that is not based on an index, and the relatively small number of rows means that I don't care if it's O(n) lookup.)

jezrael · Accepted Answer

Yes, you can use Series.item if the lookup will always returns one element from the Series:

westcoast.loc[westcoast.state=='Oregon', 'capital'].item()

Exceptions can be handled if the lookup returns nothing, or one or more values and you need only the first item:

s = westcoast.loc[westcoast.state=='Oregon', 'capital']
s = np.nan if s.empty else s.iat[0] 
print (s) #Salem

s = westcoast.loc[westcoast.state=='New York', 'capital']
s = np.nan if s.empty else s.iat[0] 
print (s)
nan

A more general solution to handle the exceptions because there are 3 possible output scenarios:

westcoast = pd.DataFrame([['Washington','Olympia'],['Oregon','Salem'],
                          ['California','Sacramento'],['Oregon','Portland']],
                        columns=['state','capital'])

print (westcoast)
        state     capital
0  Washington     Olympia
1      Oregon       Salem
2  California  Sacramento
3      Oregon    Portland

s = westcoast.loc[westcoast.state=='Oregon', 'capital']

#if not value returned
if s.empty:
    s = 'no match'

#if only one value returned
elif len(s) == 1:
    s = s.item()
else:

# if multiple values returned, return a list of values
    s = s.tolist()

print (s)
['Salem', 'Portland']

It is possible to create a lookup function:

def look_up(a):
    s = westcoast.loc[westcoast.state==a, 'capital']
    #for no match
    if s.empty:
        return np.nan
    #for match only one value
    elif len(s) == 1:
        return s.item()
    else:
    #for return multiple values
        return s.tolist()

print (look_up('Oregon'))
['Salem', 'Portland']

print (look_up('California'))
Sacramento

print (look_up('New Yourk'))
nan

lookup first match in Pandas dataframe

Answers (2)

Related Questions