Reputation: 516
I have a dictionary whose values are in a pandas series. I want to make a new series that will look up a value in a series and return a new series with associated key. Example:
import pandas as pd
df = pd.DataFrame({'season' : ['Nor 2014', 'Nor 2013', 'Nor 2013', 'Norv 2013',
'Swe 2014', 'Swe 2014', 'Swe 2013',
'Swe 2013', 'Sven 2013', 'Sven 2013', 'Norv 2014']})
nmdict = {'Norway' : [s for s in list(set(df.season)) if 'No' in s],
'Sweden' : [s for s in list(set(df.season)) if 'S' in s]}
Desired result with df['country']
as the new column name:
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
Due to nature of my data I must manually make the nmdict
as shown. I've tried this but couldn't reverse my nmdict
as arrays are not same length.
More importantly, I think my approach may be wrong. I'm coming from Excel and thinking of a vlookup solution, but according to this answer, I shouldn't be using the dictionary in this way.
Any answers appreciated.
Upvotes: 3
Views: 8971
Reputation: 55448
I've done it in a verbose manner to allow you to follow through.
First, let's define a function that determines the value 'country'
In [4]: def get_country(s):
...: if 'Nor' in s:
...: return 'Norway'
...: if 'S' in s:
...: return 'Sweden'
...: # return 'Default Country' # if you get unmatched values
In [5]: get_country('Sven')
Out[5]: 'Sweden'
In [6]: get_country('Norv')
Out[6]: 'Norway'
We can use map
to run get_country
on every row. Pandas DataFrames also have a apply()
which works similarly*.
In [7]: map(get_country, df['season'])
Out[7]:
['Norway',
'Norway',
'Norway',
'Norway',
'Sweden',
'Sweden',
'Sweden',
'Sweden',
'Sweden',
'Sweden',
'Norway']
Now we assign that result to the column called 'country'
In [8]: df['country'] = map(get_country, df['season'])
Let's view the final result:
In [9]: df
Out[9]:
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
*With apply()
here's how it would look:
In [16]: df['country'] = df['season'].apply(get_country)
In [17]: df
Out[17]:
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
pseudo-code only :)
# Modify this as needed
country_matchers = {
'Norway': ['Nor', 'Norv'],
'Sweden': ['S', 'Swed'],
}
def get_country(s):
"""
Run the passed string s against "matchers" for each country
Return the first matched country
"""
for country, matchers in country_matchers.items():
for matcher in matchers:
if matcher in s:
return country
Upvotes: 2
Reputation: 42875
You could create the country dictionary
using a dictionary comprehension
:
country_id = df.season.str.split().str.get(0).drop_duplicates()
country_dict = {c: ('Norway' if c.startswith('N') else 'Sweden') for c in country_id.values}
to get:
{'Nor': 'Norway', 'Swe': 'Sweden', 'Sven': 'Sweden', 'Norv': 'Norway'}
This works fine for two countries, otherwise you can apply
a self-defined function
in similar way:
def country_dict(country_id):
if country_id.startswith('S'):
return 'Sweden'
elif country_id.startswith('N'):
return 'Norway'
elif country_id.startswith('XX'):
return ...
else:
return 'default'
Either way, map
the dictionary
to the country_id
part of the season
column
, extracted using pandas
string
methods:
df['country'] = df.season.str.split().str.get(0).map(country_dict)
season country
0 Nor 2014 Norway
1 Nor 2013 Norway
2 Nor 2013 Norway
3 Norv 2013 Norway
4 Swe 2014 Sweden
5 Swe 2014 Sweden
6 Swe 2013 Sweden
7 Swe 2013 Sweden
8 Sven 2013 Sweden
9 Sven 2013 Sweden
10 Norv 2014 Norway
Upvotes: 1
Reputation: 21552
IIUC, I would do the following:
df['country'] = df['season'].apply(lambda x: 'Norway' if 'No' in x else 'Sweden' if 'S' in x else x)
Upvotes: 1