Reputation: 421
I have a data frame with dates and values and want to a cumsum for only the positive numbers and one for only the negatives. The dates sometimes have the same date multiple times and then a couple of days are missing (no values = no row)
First I just tested a cumulative sum. These were cumulative but not in order of date:
df$cumsum <- cumsum(df$values)
# or
df$cumsum <- ave(df$values, FUN=cumsum)
# Should cumulate by date but did not in right order
df$cumsum <- cumsum(df[order(df$date, df$values), "values"])
Finally found a solution which does the first step as I wanted (not really as I wanted to do in in a dataframe but does the job):
dt <- data.table(df)
dt[order(date), cumsum := cumsum(values)]
Great, however every attempt to filter values > 0 didn't work out. Finally I subsetted the data and got the result but it's not really what I wanted.
dt.pos <- data.table(subset(df, values> 0))
dt.pos[order(date), cumsum := cumsum(values)]
dt.neg <- data.table(subset(df, values < 0))
dt.neg[order(date), cumsum := cumsum(values)]
I'm looking for something as simple as the Python equivalent (with an ordered data frame):
df["cumsum_pos"] = df["values"][df["values"] > 0].cumsum()
df["cumsum_neg"] = df["values"][df["values"] < 0].cumsum()
/edit
df <- data.frame(date = as.Date(c("2016-12-08", "2016-12-07", "2016-12-05", "2017-01-05",
"2017-01-10", "2017-01-11", "2017-01-11")),
values = c(10, -10, 5, 5, -7, 8, 8))
# just the cumsum
# expected output = c(5, -5, 5, 10, 3, 11, 19)
df$cumsum <- cumsum(df$values)
# output = c(10, 0, 5, 10, 3, 11, 19)
df$cumsum <- ave(df$values, FUN=cumsum)
# output = c(10, 0, 5, 10, 3, 11, 19)
df$cumsum <- cumsum(df[order(df$date, df$values), "values"])
# output = c(5, -5, 5, 10, 3, 11, 19) correct in this example
# doesn't work with dates in a different order 2016-12-31, 2016-12-30, ... 2015-12-31, 2015-12-30
# Now for just the positives
# expected output = c(10, 0, 5, 15, 15, 23, 31)
df$cumsum.pos[df$values > 0] <- cumsum(df[order(df$date, df$values), "values"][df$values > 0])
# output = c(5, NA, 15, 20, NA, 28, 36)
# And then the same with just the negatives
/edit
nicolas comment doesn't produce the correct output
df<-df[order(df$date),]
# values = c(5, -10, 10, 5, -7, 8, 8)
# expected output = c(5, 5, 15, 20, 20, 28, 36)
df$cumsum<-ave(df$values,df$values>0,FUN=cumsum)
# output = c(5, -10, 15, 20, -17, 28, 36)
Upvotes: 0
Views: 1445
Reputation: 2469
You can use this.
library(data.table)
df <- as.data.table(df)
# Order by date
df <- df[order(date)]
# Perform the cumsum for positives and negatives separately
df[, expected := cumsum(values), by = sign(values)]
# Just for the negatives, get the previous positive value
df[, expected := ifelse(values > 0, expected, c(0, expected[-.N]))]
print(df)
date values expected
1: 2016-12-05 5 5
2: 2016-12-07 -10 5
3: 2016-12-08 10 15
4: 2017-01-05 5 20
5: 2017-01-10 -7 20
6: 2017-01-11 8 28
7: 2017-01-11 8 36
Note that if there are more than one consecutive negative values, you have to repeat the operation. For instance, if your data frame is this one:
df <- data.frame(date = as.Date(c("2016-12-08", "2016-12-07", "2016-12-05", "2017-01-05","2017-01-10", "2017-01-10", "2017-01-11", "2017-01-11")),
values = c(10, -10, 5, 5, -7, -15, 8, 8))
One single execution of the above code would produce the following output:
date values expected
1: 2016-12-05 5 5
2: 2016-12-07 -10 5
3: 2016-12-08 10 15
4: 2017-01-05 5 20
5: 2017-01-10 -7 20
6: 2017-01-10 -15 -17
7: 2017-01-11 8 28
8: 2017-01-11 8 36
The value -17 would be wrong. In order to avoid this problem, you can repeat the process until there aren't any negative values left. So the full code would be:
df <- df[order(date)]
df[, expected := cumsum(values), by = sign(values)]
# If there are negative values, repeat the process
while(length(which(df$expected < 0))){
df[, expected := ifelse(values > 0, expected, c(0, expected[-.N]))]
}
print(df)
date values expected
1: 2016-12-05 5 5
2: 2016-12-07 -10 5
3: 2016-12-08 10 15
4: 2017-01-05 5 20
5: 2017-01-10 -7 20
6: 2017-01-10 -15 20
7: 2017-01-11 8 28
8: 2017-01-11 8 36
Upvotes: 1