Reputation: 55
I have a huge Dataframe that looks like this:
year country population
1971 Afghanistan 11500000
1972 Afghanistan 11800000
1973 Afghanistan 12100000
1974 Afghanistan 12400000
1975 Afghanistan 12700000
I want to create a new DataFrame that will calculate the percentage difference in population, for every decade, grouped by country
country 1971-1980 1981-1990 1991-2000 2001-2010
Afghanistan -- -- -- --
Australia -- -- -- --
Need some help to understand how this can be done. Any help would be appreciated.
Upvotes: 2
Views: 483
Reputation: 862611
You can create decade column, then use DataFrame.pivot_table
with sum
and add DataFrame.pct_change
:
d = df['year'] // 10 * 10
df['dec'] = (d + 1).astype(str) + '-' + (d + 10).astype(str)
Another idea with cut
:
bins = range(df['year'].min(), df['year'].max() + 10, 10)
labels = [f'{i}-{j-1}' for i, j in zip(bins[:-1], bins[1:])]
df['dec'] = pd.cut(df.year, bins=bins, labels=labels, include_lowest=True)
df1 = (df.pivot_table(index='country',
columns='dec',
values='population',
aggfunc='sum')
.pct_change(axis=1))
Upvotes: 2