Reputation: 1
I have a dataframe df like this
Product Yr Value
A 2014 1
A 2015 3
A 2016 2
B 2015 2
B 2016 1
I want to do max cumululative ie
Product Yr Value
A 2014 1
A 2015 3
A 2016 3
B 2015 2
B 2016 2
My actual data has about 50000 products I am writing a code like:
df2=pd.DataFrame()
for i in (df['Product'].unique()):
data3=df[df['Product']==i]
data3.sort_values(by=['Yr'])
data3['Value']=data3['Value'].cummax()
df2=df2.append(data3)
#df2 is my result
This code is taking a lot of time(~3 days) for about 50000 products and 10 years. Is there some way to speed it up?
Upvotes: 0
Views: 70
Reputation: 214917
You can use groupby.cummax
instead:
df['Value'] = df.sort_values('Yr').groupby('Product').Value.cummax()
df
#Product Yr Value
#0 A 2014 1
#1 A 2015 3
#2 A 2016 3
#3 B 2015 2
#4 B 2016 2
Upvotes: 2