Reputation: 543
I have a data.table in R with a column called "score" containing both positive and negative real numbers. I want to add a new column called "sum" which represents the cumulative sum of the "score" column. However, I need the cumulative sum to have a minimum threshold. If the sum at any point falls below the minimum threshold, it should be set to the minimum value, and the next step of the cumulative sum should start from that minimum value.
I have tried using the cumsum() function to calculate the cumulative sum, but I'm not sure how to incorporate the minimum threshold logic. Here's my code so far:
dt <- data.table(score = c(10, -13, 8, -12, 6))
dt[, sum := cumsum(score)]
# its wrong
print(dt)
score sum
1: 10 10
2: -13 -3
3: 8 5
4: -12 -7
5: 6 -1
I also tried
min_value <- -3
dt[, sum := {
temp_sum <- cumsum(score)
temp_sum[temp_sum < min_value] <- min_value
temp_sum
}]
# its also wrong
print(dt)
score sum
1: 10 10
2: -13 -3
3: 8 5
4: -12 -3
5: 6 -1 # expected sum is 3 here
I also have tried this from Cumulative sum with a threshold window in R data.table and dplyr / R cumulative sum with reset
sum_score <- function(thresh) {
function(x) {
purrr::accumulate(x, ~dplyr::if_else(.x<=thresh, .y, .x+.y))
}
}
dt[, sum := sum_score(-3)(score)]
print(dt)
score sum
1: 10 10
2: -13 -3
3: 8 8
4: -12 -4
5: 6 6
Expected output:
score sum
1: 10 10
2: -13 -3
3: 8 5
4: -12 -3
5: 6 3
Upvotes: 0
Views: 112
Reputation: 9240
Here is a dplyr
option:
library(dplyr)
library(purrr)
library(tidyr)
dt <- data.table::data.table(score = c(10, -13, 8, -12, 6, -20, 40))
dt |>
mutate(
ID = row_number(),
raw_score = score,
indicator = (ifelse(cumsum(score) < -3, row_number(), NA))) |>
fill(indicator) |>
group_split(indicator) |>
map_dfr(~.x |>
mutate(score = ifelse(
!is.na(indicator) & row_number() == 1, -3, score
))) |>
arrange(ID) |>
group_by(indicator) |>
mutate(sum = cumsum(score)) |>
ungroup() |>
select(raw_score, sum)
Output:
raw_score sum
<dbl> <dbl>
1 10 10
2 -13 -3
3 8 5
4 -12 -3
5 6 3
6 -20 -3
7 40 37
Upvotes: 1
Reputation: 631
I understand you preferably want a data.table
solution but here is a simple dplyr
method, which first generated the cumsum
then alters its values accordingly:
library(dplyr)
dt <- data.frame(score = c(10, -13, 8, -12, 6))
thresh = -3
dt %>% mutate(sum = cumsum(score)) %>%
mutate(sum = case_when(
lag(sum) < thresh ~ score + thresh,
sum < thresh ~ thresh,
is.na(sum) ~ score,
.default = sum)
)
Output:
score sum
1 10 10
2 -13 -3
3 8 5
4 -12 -3
5 6 3
Upvotes: 1