Reputation: 159
I currently have a dataframe that looks like the below where I need to reset the cumsum everytime it crosses a multiple of 1000 ex (2000,3000...etc) and
Production ID cumsum
2017-10-19 1054 1323217 1054
2017-10-20 0 1323217 1054
2017-10-21 0 1323217 1054
2017-10-22 0 1323217 1054
2017-10-23 0 1323217 1054
for example in the above, I need a df that looks like the below:
Production ID cumsum adjCumsum numberGenerated
2017-10-19 1054 1323217 1054 1000 1
2017-10-20 0 1323217 1054 54 0
2017-10-21 0 1323217 1054 54 0
2017-10-22 3054 1323217 4108 4000 4
2017-10-23 0 1323217 4018 108 0
2017-10-23 500 1323218 500 500 0
The below, correctly resets the value every 1000 but I can't quite seem to figure out how to translate this over grouping it by ID and rounding it to the 1000s.
maxvalue = 1000
lastvalue = 0
newcum = []
for row in df.iterrows():
thisvalue = row[1]['cumsum'] + lastvalue
if thisvalue > maxvalue:
thisvalue = 0
newcum.append( thisvalue )
lastvalue = thisvalue
df['newcum'] = newcum
thanks to the answer below I'm now able to calculate cumulative number generated, but I need to calc incremental # generated.
df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000
multiple = (df['cumsum'] // thresh )
mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] %
thresh
df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()
My initial thinking was to try something similar to:
df['numGen1'] = df['cumsum2'].diff()
Final Edit Tested and working. Thanks for the help getting here
I was overthinking it, below is how I was able to do it:
df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000
multiple = (df['cumsum'] // thresh )
mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh
df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()
numgen = []
adjcumsum = []
for i in range(len(df['cumsum'])):
if df['cumsum'][i] > thresh and (df['ID'][i] == df['ID'][i-1]):
numgenv = (df['cumsum'][i] // thresh) - (df['cumsum'][i-1] // thresh)
numgen.append(numgenv)
elif df['cumsum'][i] > thresh:
numgenv = (df['cumsum'][i] // thresh)
numgen.append(numgenv)
else:
numgenv = 0
numgen.append(numgenv)
df['numgen2.0'] = numgen
Upvotes: 1
Views: 201
Reputation: 150735
IIUC, this simply is an integer division problem with some tricks:
thresh = 1000
df['cumsum'] = df['Production'].cumsum()
# how many times cumsum passes thresh
multiple = (df['cumsum'] // thresh )
# detect where thresh is pass
mask = multiple.diff().ne(0)
# update the number generated:
df['numberGenerated'] = np.where(mask, multiple, 0)
# then the adjusted cumsum
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh
Output:
Production ID cumsum adjCumsum numberGenerated
2017-10-19 1054 1323217 1054 1054 1
2017-10-20 0 1323217 1054 54 0
2017-10-21 0 1323217 1054 54 0
2017-10-22 3054 1323217 4108 4108 4
2017-10-23 0 1323217 4108 108 0
2017-10-23 500 1323218 4608 608 0
Upvotes: 2