Zarathustra
Zarathustra

Reputation: 145

Grouping a dataframe using dictionary

I've got a dataframe with country names as row index, and a dictionary with Continent/Country pairs as follows:

ContinentDict  = {'China':'Asia', 
              'United States':'North America', 
              'Japan':'Asia', 
              'United Kingdom':'Europe', 
              'Russian Federation':'Europe', 
              'Canada':'North America', 
              'Germany':'Europe', 
              'India':'Asia',
              'France':'Europe', 
              'South Korea':'Asia', 
              'Italy':'Europe', 
              'Spain':'Europe', 
              'Iran':'Asia',
              'Australia':'Australia', 
              'Brazil':'South America'}

I want to use the groupby function to group my dataframe according to these continents. I've thought about merging the continents as an additional column to the dataframe, but that seems clunky. What would be best practice in this case?

Thanks!

PS: I'm generally a bit confused about the use of Dictionaries in Python and how to use them cohesively with dataframes

Edit: My original dataframe with the countries has columns with some statistics on population. The next step in my workflow after grouping by continent is to calculate the mean, std dev. etc. for each continent.

Upvotes: 2

Views: 606

Answers (2)

Matthew Borish
Matthew Borish

Reputation: 3086

df = pd.DataFrame(ContinentDict, index=range(len(ContinentDict))).drop_duplicates().T
df['country'] = df.index
df.rename(columns={0: 'continent'}, inplace=True)
df_gb = df.groupby('continent', as_index=False, sort=False).agg(','.join)

print(df_gb)

continent   country
0   Asia    China,Japan,India,South Korea,Iran
1   North America   United States,Canada
2   Europe  United Kingdom,Russian Federation,Germany,Fran...
3   Australia   Australia
4   South America   Brazil

Upvotes: 2

wwnde
wwnde

Reputation: 26676

Can do the following and show groups by grouped.groups which will give you the groups and their indices. However the only downsize of passing a series to groupby is that the that the series will have the same length as the dataframe.

df=pd.DataFrame(ContinentDict.items())#dict to dataframe
df.columns=['Country','Continent']#dataframe columns
df.groupby('Continent').groups#groupby and get groups

Upvotes: 2

Related Questions