Paul Tansley
Paul Tansley

Reputation: 181

Conditional rolling sum loop in R

I'm looking for some kind kind of conditional rolling sum I thought a while loop would do what I need, but I'm having trouble implementing it. So this should look like PCAR[1]*time[1]+PCAR[2]*time[2]+PCAR[3]*time[3] etc where [] references the row of the column, and this would loop until the cumulative time value reachs <= 100 years, then the loop should add this value to a column and then start again until cumulative time is between 100 and <= 200, and so on until the bottom of the data set. It's going to be applied to datasets of varying sizes with tens of thousands of years in.

I hope that makes sense. In the example data below the PCAR_BIN column is what I'm aiming for as the outcome.

df <- tibble(cumulative.time = c(20,40,60,80,100, 120,140,160,180,200), 
             PCAR =1:10,
             time = 1:10,
             depth.along.core = 1:10, 
             Age.cal.BP = 1:10, 
             AFBD = 1:10, 
             assumed.C = rep(0.5, 10),
             PCAR_BIN = c(55,330,NA,NA,NA,NA,NA,NA,NA,NA))

The function looks like

MBA <- function(data) {
  require(dplyr)
  data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
                  PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP), 
                  PCA_NCP[is.na(PCA_NCP)] <- 0, 
                  CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
                  CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up,
                  PCAR_BIN = ifelse(cumulative.time <= 100, sum(PCAR*time+lead(PCAR)*lead(time),NA)
                    )} 

Obviously I had no luck with the ifelse satement, as it would only work for one iteration of time and the sum is wrong. I've tried similar with while and for loops but with no luck. Part of the problem is I'm not sure how to express the sum that I need. I've also tried binning the data with case_when, and working off that, but with no luck again.

Thanks people :)

EDIT

Following Martins method I now have the function working up to creating the ROLLSUM Column, I now need to create a column that will give the maximum value for each century group. Running the code from slicemax onward gives me the error: Error in eval(lhs, parent, parent) : object 'tmp' not found

I've added the real data too.

dput(head(EMC))
structure(list(depth.along.core = c(0.5, 1.5, 2.5, 3.5, 4.5, 
5.5), Age.cal.BP = c(-56.016347625, -55.075825875, -54.201453125, 
-53.365755375, -52.541258625, -51.700488875), time = c(0.94052175, 
0.87437275, 0.83569775, 0.82449675, 0.84076975, 0.88451675), 
    cumulative.time = c(0.94052175, 1.8148945, 2.65059225, 3.475089, 
    4.31585875, 5.2003755), AFBD = c(0.0711, 0.057, 0.0568, 0.0512, 
    0.0559, 0.0353), assumed.C = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5
    )), row.names = c(NA, 6L), class = "data.frame")

MBA <- function(data) {
  require(dplyr)
  data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
                  PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP), 
                  PCA_NCP[is.na(PCA_NCP)] <- 0, 
                  CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
                  CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up)%>%
                  slice(1:(n()-1))%>%
                  group_by(Century = cut(cumulative.time, breaks = seq(0, max(cumulative.time), 100)))%>%
                  mutate(ROLLSUM = rev(cumsum(PCAR*time)))%>%
                  slice_max(order_by = ROLLSUM, n = 1) %>%
                  pull(ROLLSUM)%>%
                  df$ROLLSUM <- c(groupMaxima, rep(NA, nrow(df) - length(groupMaxima)))}

Upvotes: 0

Views: 300

Answers (2)

Martin Schmelzer
Martin Schmelzer

Reputation: 23919

You could try this:

# Get cumulative sums by group (assuming per century groups)
df <- df %>% 
  group_by(Century = cut(cumulative.time, 
                         breaks = seq(0, max(cumulative.time), 100))) %>%
  mutate(ROLLSUM = rev(cumsum(PCAR * time)))

# Get maximum of each group
groupMaxima <- df %>%
  slice_max(order_by = ROLLSUM, n = 1) %>%
  pull(ROLLSUM)

# Fill column as desired
df$ROLLSUM <- c(groupMaxima, rep(NA, nrow(df) - length(groupMaxima)))  

We simply create a factor column to group the cumulative time column by centuries and use that factor to sum up the values. Lastly we edit the rolling sum column to contain only the max values and fill the other rows with NA.

# A tibble: 10 x 10
# Groups:   Group [2]
   cumulative.time  PCAR  time depth.along.core Age.cal.BP  AFBD assumed.C PCAR_BIN Group     ROLLSUM
             <dbl> <int> <int>            <int>      <int> <int>     <dbl>    <dbl> <fct>       <int>
 1              20     1     1                1          1     1       0.5       55 (0,100]        55
 2              40     2     2                2          2     2       0.5      330 (0,100]       330
 3              60     3     3                3          3     3       0.5       NA (0,100]        NA
 4              80     4     4                4          4     4       0.5       NA (0,100]        NA
 5             100     5     5                5          5     5       0.5       NA (0,100]        NA
 6             120     6     6                6          6     6       0.5       NA (100,200]      NA
 7             140     7     7                7          7     7       0.5       NA (100,200]      NA
 8             160     8     8                8          8     8       0.5       NA (100,200]      NA
 9             180     9     9                9          9     9       0.5       NA (100,200]      NA
10             200    10    10               10         10    10       0.5       NA (100,200]      NA

Edit:

For this special case:

MBA <- function(data) {
  require(dplyr)
  data <- data %>% mutate(PCAR = ((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
                  PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP), 
                  PCA_NCP[is.na(PCA_NCP)] <- 0, 
                  CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
                  CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up) 
  
  data <- data %>%
    group_by(CTIME = cut(cumsum(cumulative.time), 
                         breaks = seq(0, max(cumsum(cumulative.time), na.rm = T), 100))) %>%
    mutate(ROLLSUM = rev(cumsum(PCAR*time))) 
  
  groupMaxima <- data %>% slice_max(order_by = ROLLSUM, n = 1) %>%
    pull(ROLLSUM)
  
  data$ROLLSUM <- c(groupMaxima, rep(NA, nrow(data) - length(groupMaxima)))
  data
}

Upvotes: 2

dvd280
dvd280

Reputation: 962

There are a number of ways, if your steps are really steps of 100 years, and the values go 0,20,40 in constant intervals- you can do this natively:

steps = 100
intervals = 20
ratio = steps / intervals
columns = df[,c("PCAR","time")] 
indices = rep(ratio,nrow(df)) %>% cumsum
    
PCAR_BIN = lapply(indices,function(x){
             localRange = (x-ratio):x
             sum(columns[localRange,1] * columns[localRange,2])
           })%>% unlist

we can now bind PICAR_BIN:

df = cbind(df,PICAR_BIN)

Upvotes: 0

Related Questions