Reputation: 1081
I have a pandas DataFrame that currently looks like this
Here is the df.head()
and df.tail()
Name Count Year Gender
0 John 9655 1880 M
1 William 9532 1880 M
2 James 5927 1880 M
3 Charles 5348 1880 M
4 George 5126 1880 M
Name Count Year Gender
743745 Zykeem 5 2014 M
743746 Zymeer 5 2014 M
743747 Zymiere 5 2014 M
743748 Zyran 5 2014 M
743749 Zyrin 5 2014 M
It is the number of babies named that during that year. I want to calculate the percent change from the previous year. Is there a pythonic way of using pandas to do that simply or do I need to make a complicated loop.
Upvotes: 1
Views: 482
Reputation: 210842
assuming that you have the following DF:
In [13]: df
Out[13]:
Name Count Year Gender
0 John 9655 1880 M
1 William 9532 1880 M
2 James 5927 1880 M
3 John 9000 1881 M
4 William 8000 1881 M
5 James 5000 1881 M
you can use groupby()
followed by pct_change()
:
In [14]: df['pct_change'] = df.sort_values('Year').groupby('Name').Count.pct_change() * 100
In [15]: df
Out[15]:
Name Count Year Gender pct_change
0 John 9655 1880 M NaN
1 William 9532 1880 M NaN
2 James 5927 1880 M NaN
3 John 9000 1881 M -6.784050
4 William 8000 1881 M -16.072178
5 James 5000 1881 M -15.640290
Upvotes: 0
Reputation: 249153
Pivot your data so it looks like this:
John Simon Barry
1880 30 0 0
1930 20 10 5
1960 18 9 8
Then it's a simple diff, something like this:
df.iloc[1:] = 100.0 * (df.iloc[1:] - df.iloc[:-1]) / df.iloc[:-1]
df.iloc[0] = np.nan
Upvotes: 0
Reputation: 8683
First step involves getting groups. Then you can iterate over the groups, merge your DataFrames, compute your statistics and collect the result in another DataFrame:
perc_chng = []
keys = []
grouped = df.groupby('Year')
for name, group in grouped[1:]:
try:
prev = group.get(name - 1)
except KeyError:
prev = pd.DataFrame()
merged = pd.merge(group, prev, how='outer', on='Name').set_index('Name')
merged['perc'] = merged['Count_x'].difference(merged['Count_y']).\
divide(merged['Count_y']).multiply(100)
perc_chng.extend([])
keys.extend(['{p}-{c}'.format(p=name-1, c=name)])
res = pd.concat(perc_chng, keys=keys)
Upvotes: 1