Reputation: 145
I've got a dataframe with country names as row index, and a dictionary with Continent/Country pairs as follows:
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
I want to use the groupby function to group my dataframe according to these continents. I've thought about merging the continents as an additional column to the dataframe, but that seems clunky. What would be best practice in this case?
Thanks!
PS: I'm generally a bit confused about the use of Dictionaries in Python and how to use them cohesively with dataframes
Edit: My original dataframe with the countries has columns with some statistics on population. The next step in my workflow after grouping by continent is to calculate the mean, std dev. etc. for each continent.
Upvotes: 2
Views: 606
Reputation: 3086
df = pd.DataFrame(ContinentDict, index=range(len(ContinentDict))).drop_duplicates().T
df['country'] = df.index
df.rename(columns={0: 'continent'}, inplace=True)
df_gb = df.groupby('continent', as_index=False, sort=False).agg(','.join)
print(df_gb)
continent country
0 Asia China,Japan,India,South Korea,Iran
1 North America United States,Canada
2 Europe United Kingdom,Russian Federation,Germany,Fran...
3 Australia Australia
4 South America Brazil
Upvotes: 2
Reputation: 26676
Can do the following and show groups by grouped.groups which will give you the groups and their indices. However the only downsize of passing a series to groupby is that the that the series will have the same length as the dataframe.
df=pd.DataFrame(ContinentDict.items())#dict to dataframe
df.columns=['Country','Continent']#dataframe columns
df.groupby('Continent').groups#groupby and get groups
Upvotes: 2