Reputation: 189826
I need to use a DataFrame as a lookup table on columns that are not part of the index. For example (this is a simple one just to illustrate):
import pandas as pd
westcoast = pd.DataFrame([['Washington','Olympia'],['Oregon','Salem'],
['California','Sacramento']],
columns=['state','capital'])
print westcoast
state capital
0 Washington Olympia
1 Oregon Salem
2 California Sacramento
It's easy to lookup and get a Series as an output:
westcoast[westcoast.state=='Oregon'].capital
1 Salem
Name: capital, dtype: object
but I want to obtain the string 'Salem':
westcoast[westcoast.state=='Oregon'].capital.values[0]
'Salem'
and the .values[0]
seems somewhat clunky... is there a better way?
(FWIW: my real data has maybe 50 rows at most, but lots of columns, so if I do set an index column, no matter what column I choose, there will be a lookup operation like this that is not based on an index, and the relatively small number of rows means that I don't care if it's O(n) lookup.)
Upvotes: 5
Views: 9806
Reputation: 863281
Yes, you can use Series.item
if the lookup will always returns one element from the Series
:
westcoast.loc[westcoast.state=='Oregon', 'capital'].item()
Exceptions can be handled if the lookup returns nothing, or one or more values and you need only the first item:
s = westcoast.loc[westcoast.state=='Oregon', 'capital']
s = np.nan if s.empty else s.iat[0]
print (s) #Salem
s = westcoast.loc[westcoast.state=='New York', 'capital']
s = np.nan if s.empty else s.iat[0]
print (s)
nan
A more general solution to handle the exceptions because there are 3 possible output scenarios:
westcoast = pd.DataFrame([['Washington','Olympia'],['Oregon','Salem'],
['California','Sacramento'],['Oregon','Portland']],
columns=['state','capital'])
print (westcoast)
state capital
0 Washington Olympia
1 Oregon Salem
2 California Sacramento
3 Oregon Portland
s = westcoast.loc[westcoast.state=='Oregon', 'capital']
#if not value returned
if s.empty:
s = 'no match'
#if only one value returned
elif len(s) == 1:
s = s.item()
else:
# if multiple values returned, return a list of values
s = s.tolist()
print (s)
['Salem', 'Portland']
It is possible to create a lookup function:
def look_up(a):
s = westcoast.loc[westcoast.state==a, 'capital']
#for no match
if s.empty:
return np.nan
#for match only one value
elif len(s) == 1:
return s.item()
else:
#for return multiple values
return s.tolist()
print (look_up('Oregon'))
['Salem', 'Portland']
print (look_up('California'))
Sacramento
print (look_up('New Yourk'))
nan
Upvotes: 6
Reputation: 880429
If you are going to do frequent lookups of this sort, then it pays to make state
the index:
state_capitals = westcoast.set_index('state')['capital']
print(state_capitals['Oregon'])
# Salem
With an index, each lookup is O(1)
on average, whereas westcoast['state']=='Oregon'
requires O(n)
comparisons. Of course, building the index is also O(n)
, so you would need to do many lookups for this to pay off.
At the same time, once you have state_capitals
the syntax is simple and dict-like. That might be reason enough to build state_capitals
.
Upvotes: 0