Reputation: 361
I am building a dataframe df
like this
df = pd.DataFrame({
'South East Asia': ['India', 'Sri Lanka', 'Mali', 'Maldives'],
'Europe': ['France', 'Italy', 'Germany', 'Spain'],
'North America': ['Mexico', 'US', 'Canada', np.nan]},
columns=['South East Asia', 'Europe', 'North America'])
Which looks like this:
South East Asia Europe North America
0 India France Mexico
1 Sri Lanka Italy US
2 Mali Germany Canada
3 Maldives Spain NaN
I have a customer data (~40K rows) in another dataframe df2
with a column Country
with countries from all the 11 countries listed above.
I want to make a new column in df2
called Continent which would have the corresponding continent as mentioned in my dataframe df
. So my desired output for
print(df2.loc[:, 'Country':'Continent'])
would be
Country Continent
0 France Europe
1 Italy Europe
2 Mexico North America
...
Note: While looking up this query online, I found that there are some ready-made tools to categories countries into continents, which is understandable since this must be required frequently within the community. But I want to learn how to categorise data based on datasets in another dataframes. That's why I'm trying to achieve the same in only these 11 countries and 3 continents.
Thanks in advance!
Upvotes: 2
Views: 654
Reputation: 164703
Using pandas.melt
, you can create a series mapping country to continent:
s = pd.melt(df).set_index('value')['variable']
df2['Continent'] = df2['Country'].map(s).fillna('DefaultContinent')
For NaN
, you may want to take extra care. For example:
mapper = pd.melt(df).dropna(subset=['value']).set_index('value')['variable']
Upvotes: 1
Reputation: 862851
Use dict comprehension with flattening for dictionary:
s = {k:v for a, b in df.items() for k, v in dict.fromkeys(b, a).items()}
#if want remove NaNs
#s = {k:v for a, b in df.items() for k, v in dict.fromkeys(b, a).items() if pd.notnull(k)}
Or create Series
with stack
:
s = df.stack().reset_index().set_index(0)['level_1']
print (s)
India South East Asia
France Europe
Mexico North America
Sri Lanka South East Asia
Italy Europe
US North America
Mali South East Asia
Germany Europe
Canada North America
Maldives South East Asia
Spain Europe
Name: level_1, dtype: object
and last map
:
df2['Continent'] = df2['Country'].map(s)
Upvotes: 3