A Adhikari
A Adhikari

Reputation: 1

Loop Optimization in python

I have a dataframe df like this

Product Yr Value
A      2014 1
A      2015 3
A      2016 2
B      2015 2
B      2016 1

I want to do max cumululative ie

Product Yr Value
A      2014 1
A      2015 3
A      2016 3
B      2015 2
B      2016 2

My actual data has about 50000 products I am writing a code like:

df2=pd.DataFrame()
for i in (df['Product'].unique()):
    data3=df[df['Product']==i]
    data3.sort_values(by=['Yr'])
    data3['Value']=data3['Value'].cummax()
    df2=df2.append(data3)

#df2 is my result

This code is taking a lot of time(~3 days) for about 50000 products and 10 years. Is there some way to speed it up?

Upvotes: 0

Views: 70

Answers (1)

akuiper
akuiper

Reputation: 214917

You can use groupby.cummax instead:

df['Value'] = df.sort_values('Yr').groupby('Product').Value.cummax()

df
#Product      Yr    Value
#0     A    2014    1
#1     A    2015    3
#2     A    2016    3
#3     B    2015    2
#4     B    2016    2

Upvotes: 2

Related Questions