Reputation: 516

Find key from value for Pandas Series

I have a dictionary whose values are in a pandas series. I want to make a new series that will look up a value in a series and return a new series with associated key. Example:

import pandas as pd

df = pd.DataFrame({'season' : ['Nor 2014', 'Nor 2013', 'Nor 2013', 'Norv 2013',
                           'Swe 2014', 'Swe 2014',  'Swe 2013',
                           'Swe 2013', 'Sven 2013', 'Sven 2013', 'Norv 2014']})

nmdict = {'Norway' : [s for s in list(set(df.season)) if 'No' in s],
                  'Sweden' : [s for s in list(set(df.season)) if 'S' in s]}

Desired result with df['country'] as the new column name:

       season country
0    Nor 2014  Norway
1    Nor 2013  Norway
2    Nor 2013  Norway
3   Norv 2013  Norway
4    Swe 2014  Sweden
5    Swe 2014  Sweden
6    Swe 2013  Sweden
7    Swe 2013  Sweden
8   Sven 2013  Sweden
9   Sven 2013  Sweden
10  Norv 2014  Norway

Due to nature of my data I must manually make the nmdict as shown. I've tried this but couldn't reverse my nmdict as arrays are not same length.

More importantly, I think my approach may be wrong. I'm coming from Excel and thinking of a vlookup solution, but according to this answer, I shouldn't be using the dictionary in this way.

Any answers appreciated.

Upvotes: 3

Answers (3)

bakkal

Reputation: 55448

I've done it in a verbose manner to allow you to follow through.

First, let's define a function that determines the value 'country'

In [4]: def get_country(s):
   ...:     if 'Nor' in s:
   ...:         return 'Norway'
   ...:     if 'S' in s:
   ...:         return 'Sweden'
   ...:     # return 'Default Country' # if you get unmatched values

In [5]: get_country('Sven')
Out[5]: 'Sweden'

In [6]: get_country('Norv')
Out[6]: 'Norway'

We can use map to run get_country on every row. Pandas DataFrames also have a apply() which works similarly*.

In [7]: map(get_country, df['season'])
Out[7]: 
['Norway',
 'Norway',
 'Norway',
 'Norway',
 'Sweden',
 'Sweden',
 'Sweden',
 'Sweden',
 'Sweden',
 'Sweden',
 'Norway']

Now we assign that result to the column called 'country'

In [8]: df['country'] = map(get_country, df['season'])

Let's view the final result:

In [9]: df
Out[9]: 
       season country
0    Nor 2014  Norway
1    Nor 2013  Norway
2    Nor 2013  Norway
3   Norv 2013  Norway
4    Swe 2014  Sweden
5    Swe 2014  Sweden
6    Swe 2013  Sweden
7    Swe 2013  Sweden
8   Sven 2013  Sweden
9   Sven 2013  Sweden
10  Norv 2014  Norway

*With apply() here's how it would look:

In [16]: df['country'] = df['season'].apply(get_country)

In [17]: df
Out[17]: 
       season country
0    Nor 2014  Norway
1    Nor 2013  Norway
2    Nor 2013  Norway
3   Norv 2013  Norway
4    Swe 2014  Sweden
5    Swe 2014  Sweden
6    Swe 2013  Sweden
7    Swe 2013  Sweden
8   Sven 2013  Sweden
9   Sven 2013  Sweden
10  Norv 2014  Norway

A more scalable country matcher

pseudo-code only :)

# Modify this as needed
country_matchers = {
    'Norway': ['Nor', 'Norv'],
    'Sweden': ['S', 'Swed'], 
}

def get_country(s):
    """
    Run the passed string s against "matchers" for each country
    Return the first matched country
    """
    for country, matchers in country_matchers.items():
        for matcher in matchers:
            if matcher in s:
                return country

Upvotes: 2

Stefan

Reputation: 42875

You could create the country dictionary using a dictionary comprehension:

country_id = df.season.str.split().str.get(0).drop_duplicates()
country_dict = {c: ('Norway' if c.startswith('N') else 'Sweden') for c in country_id.values}

to get:

{'Nor': 'Norway', 'Swe': 'Sweden', 'Sven': 'Sweden', 'Norv': 'Norway'}

This works fine for two countries, otherwise you can apply a self-defined function in similar way:

def country_dict(country_id):
    if country_id.startswith('S'):
        return 'Sweden'
    elif country_id.startswith('N'):
        return 'Norway'
    elif country_id.startswith('XX'):
        return ...
    else:
        return 'default'

Either way, map the dictionary to the country_id part of the season column, extracted using pandas string methods:

df['country'] = df.season.str.split().str.get(0).map(country_dict)


       season country
0    Nor 2014  Norway
1    Nor 2013  Norway
2    Nor 2013  Norway
3   Norv 2013  Norway
4    Swe 2014  Sweden
5    Swe 2014  Sweden
6    Swe 2013  Sweden
7    Swe 2013  Sweden
8   Sven 2013  Sweden
9   Sven 2013  Sweden
10  Norv 2014  Norway

Upvotes: 1

Fabio Lamanna

Reputation: 21552

IIUC, I would do the following:

df['country'] = df['season'].apply(lambda x: 'Norway' if 'No' in x else 'Sweden' if 'S' in x else x)

Upvotes: 1

Find key from value for Pandas Series

Answers (3)

A more scalable country matcher

Related Questions