Reputation: 32336
I have a dataframe where the daily sales is recorded. I need to know the fastest growing product. For e.g. in this example, ice-cream sale during 22-23 Jan was the highest across all products.
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
myst="""
20-01-17 pizza 90
21-01-17 pizza 120
22-01-17 pizza 239
23-01-17 pizza 200
20-01-17 fried-rice 100
21-01-17 fried-rice 120
22-01-17 fried-rice 110
23-01-17 fried-rice 190
20-01-17 ice-cream 8
21-01-17 ice-cream 23
22-01-17 ice-cream 21
23-01-17 ice-cream 100
"""
u_cols=['date', 'product', 'sales']
And this is how I created the dataframe:
myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\t', names = u_cols)
It will look like this in a spreadsheet. How will pandas handle it?
Upvotes: 1
Views: 77
Reputation: 863216
I think you need pct_change
:
df['new'] = df.groupby('product')['sales'].pct_change().mul(100)
print (df)
date product sales new
0 20-01-17 pizza 90 NaN
1 21-01-17 pizza 120 33.333333
2 22-01-17 pizza 239 99.166667
3 23-01-17 pizza 200 -16.317992
4 20-01-17 fried-rice 100 NaN
5 21-01-17 fried-rice 120 20.000000
6 22-01-17 fried-rice 110 -8.333333
7 23-01-17 fried-rice 190 72.727273
8 20-01-17 ice-cream 8 NaN
9 21-01-17 ice-cream 23 187.500000
10 22-01-17 ice-cream 21 -8.695652
11 23-01-17 ice-cream 100 376.190476
a = df.groupby('product')['sales'].pct_change().idxmax()
print (a)
11
b = 'sale: {}, during: from {} to {}'.format(df.loc[a, 'product'],
df.loc[a-1, 'date'],
df.loc[a, 'date'])
print (b)
sale: ice-cream, during: from 22-01-17 to 23-01-17
Upvotes: 2