finding highest cumulative percent change

Question

I have a dataframe where the daily sales is recorded. I need to know the fastest growing product. For e.g. in this example, ice-cream sale during 22-23 Jan was the highest across all products.

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

myst="""
20-01-17    pizza   90
21-01-17    pizza   120
22-01-17    pizza   239
23-01-17    pizza   200
20-01-17    fried-rice  100
21-01-17    fried-rice  120
22-01-17    fried-rice  110
23-01-17    fried-rice  190
20-01-17    ice-cream   8
21-01-17    ice-cream   23
22-01-17    ice-cream   21
23-01-17    ice-cream   100
"""
u_cols=['date', 'product', 'sales']

And this is how I created the dataframe:

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='	', names = u_cols)

It will look like this in a spreadsheet. How will pandas handle it?

jezrael · Accepted Answer

I think you need pct_change:

df['new'] = df.groupby('product')['sales'].pct_change().mul(100)
print (df)
        date     product  sales         new
0   20-01-17       pizza     90         NaN
1   21-01-17       pizza    120   33.333333
2   22-01-17       pizza    239   99.166667
3   23-01-17       pizza    200  -16.317992
4   20-01-17  fried-rice    100         NaN
5   21-01-17  fried-rice    120   20.000000
6   22-01-17  fried-rice    110   -8.333333
7   23-01-17  fried-rice    190   72.727273
8   20-01-17   ice-cream      8         NaN
9   21-01-17   ice-cream     23  187.500000
10  22-01-17   ice-cream     21   -8.695652
11  23-01-17   ice-cream    100  376.190476

a = df.groupby('product')['sales'].pct_change().idxmax()
print (a)
11

b = 'sale: {}, during: from {} to {}'.format(df.loc[a, 'product'], 
                                            df.loc[a-1, 'date'],
                                            df.loc[a, 'date'])
print (b)
sale: ice-cream, during: from 22-01-17 to 23-01-17

finding highest cumulative percent change

Answers (1)

Related Questions