Mark
Mark

Reputation: 1081

How to calculate average with Pandas Dataframe using Cells from different rows

I have a pandas DataFrame that currently looks like this

Here is the df.head() and df.tail()

      Name  Count  Year Gender
0     John   9655  1880      M
1  William   9532  1880      M
2    James   5927  1880      M
3  Charles   5348  1880      M
4   George   5126  1880      M
           Name  Count  Year Gender
743745   Zykeem      5  2014      M
743746   Zymeer      5  2014      M
743747  Zymiere      5  2014      M
743748    Zyran      5  2014      M
743749    Zyrin      5  2014      M

It is the number of babies named that during that year. I want to calculate the percent change from the previous year. Is there a pythonic way of using pandas to do that simply or do I need to make a complicated loop.

Upvotes: 1

Views: 482

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

assuming that you have the following DF:

In [13]: df
Out[13]:
      Name  Count  Year Gender
0     John   9655  1880      M
1  William   9532  1880      M
2    James   5927  1880      M
3     John   9000  1881      M
4  William   8000  1881      M
5    James   5000  1881      M

you can use groupby() followed by pct_change():

In [14]: df['pct_change'] = df.sort_values('Year').groupby('Name').Count.pct_change() * 100

In [15]: df
Out[15]:
      Name  Count  Year Gender  pct_change
0     John   9655  1880      M         NaN
1  William   9532  1880      M         NaN
2    James   5927  1880      M         NaN
3     John   9000  1881      M   -6.784050
4  William   8000  1881      M  -16.072178
5    James   5000  1881      M  -15.640290

Upvotes: 0

John Zwinck
John Zwinck

Reputation: 249153

Pivot your data so it looks like this:

       John  Simon  Barry
1880     30      0      0
1930     20     10      5
1960     18      9      8

Then it's a simple diff, something like this:

df.iloc[1:] = 100.0 * (df.iloc[1:] - df.iloc[:-1]) / df.iloc[:-1]
df.iloc[0] = np.nan

Upvotes: 0

Kartik
Kartik

Reputation: 8683

First step involves getting groups. Then you can iterate over the groups, merge your DataFrames, compute your statistics and collect the result in another DataFrame:

perc_chng = []
keys = []

grouped = df.groupby('Year')
for name, group in grouped[1:]:
    try:
        prev = group.get(name - 1)
    except KeyError:
        prev = pd.DataFrame()
    merged = pd.merge(group, prev, how='outer', on='Name').set_index('Name')
    merged['perc'] = merged['Count_x'].difference(merged['Count_y']).\
        divide(merged['Count_y']).multiply(100)
    perc_chng.extend([])
    keys.extend(['{p}-{c}'.format(p=name-1, c=name)])

res = pd.concat(perc_chng, keys=keys)

Upvotes: 1

Related Questions