Reputation: 181
I'm looking for some kind kind of conditional rolling sum I thought a while loop would do what I need, but I'm having trouble implementing it. So this should look like PCAR[1]*time[1]+PCAR[2]*time[2]+PCAR[3]*time[3] etc where [] references the row of the column, and this would loop until the cumulative time value reachs <= 100 years, then the loop should add this value to a column and then start again until cumulative time is between 100 and <= 200, and so on until the bottom of the data set. It's going to be applied to datasets of varying sizes with tens of thousands of years in.
I hope that makes sense. In the example data below the PCAR_BIN column is what I'm aiming for as the outcome.
df <- tibble(cumulative.time = c(20,40,60,80,100, 120,140,160,180,200),
PCAR =1:10,
time = 1:10,
depth.along.core = 1:10,
Age.cal.BP = 1:10,
AFBD = 1:10,
assumed.C = rep(0.5, 10),
PCAR_BIN = c(55,330,NA,NA,NA,NA,NA,NA,NA,NA))
The function looks like
MBA <- function(data) {
require(dplyr)
data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP),
PCA_NCP[is.na(PCA_NCP)] <- 0,
CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up,
PCAR_BIN = ifelse(cumulative.time <= 100, sum(PCAR*time+lead(PCAR)*lead(time),NA)
)}
Obviously I had no luck with the ifelse satement, as it would only work for one iteration of time and the sum is wrong. I've tried similar with while and for loops but with no luck. Part of the problem is I'm not sure how to express the sum that I need. I've also tried binning the data with case_when, and working off that, but with no luck again.
Thanks people :)
EDIT
Following Martins method I now have the function working up to creating the ROLLSUM Column, I now need to create a column that will give the maximum value for each century group. Running the code from slicemax onward gives me the error: Error in eval(lhs, parent, parent) : object 'tmp' not found
I've added the real data too.
dput(head(EMC))
structure(list(depth.along.core = c(0.5, 1.5, 2.5, 3.5, 4.5,
5.5), Age.cal.BP = c(-56.016347625, -55.075825875, -54.201453125,
-53.365755375, -52.541258625, -51.700488875), time = c(0.94052175,
0.87437275, 0.83569775, 0.82449675, 0.84076975, 0.88451675),
cumulative.time = c(0.94052175, 1.8148945, 2.65059225, 3.475089,
4.31585875, 5.2003755), AFBD = c(0.0711, 0.057, 0.0568, 0.0512,
0.0559, 0.0353), assumed.C = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5
)), row.names = c(NA, 6L), class = "data.frame")
MBA <- function(data) {
require(dplyr)
data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP),
PCA_NCP[is.na(PCA_NCP)] <- 0,
CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up)%>%
slice(1:(n()-1))%>%
group_by(Century = cut(cumulative.time, breaks = seq(0, max(cumulative.time), 100)))%>%
mutate(ROLLSUM = rev(cumsum(PCAR*time)))%>%
slice_max(order_by = ROLLSUM, n = 1) %>%
pull(ROLLSUM)%>%
df$ROLLSUM <- c(groupMaxima, rep(NA, nrow(df) - length(groupMaxima)))}
Upvotes: 0
Views: 300
Reputation: 23919
You could try this:
# Get cumulative sums by group (assuming per century groups)
df <- df %>%
group_by(Century = cut(cumulative.time,
breaks = seq(0, max(cumulative.time), 100))) %>%
mutate(ROLLSUM = rev(cumsum(PCAR * time)))
# Get maximum of each group
groupMaxima <- df %>%
slice_max(order_by = ROLLSUM, n = 1) %>%
pull(ROLLSUM)
# Fill column as desired
df$ROLLSUM <- c(groupMaxima, rep(NA, nrow(df) - length(groupMaxima)))
We simply create a factor column to group the cumulative time column by centuries and use that factor to sum up the values. Lastly we edit the rolling sum column to contain only the max values and fill the other rows with NA
.
# A tibble: 10 x 10
# Groups: Group [2]
cumulative.time PCAR time depth.along.core Age.cal.BP AFBD assumed.C PCAR_BIN Group ROLLSUM
<dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <fct> <int>
1 20 1 1 1 1 1 0.5 55 (0,100] 55
2 40 2 2 2 2 2 0.5 330 (0,100] 330
3 60 3 3 3 3 3 0.5 NA (0,100] NA
4 80 4 4 4 4 4 0.5 NA (0,100] NA
5 100 5 5 5 5 5 0.5 NA (0,100] NA
6 120 6 6 6 6 6 0.5 NA (100,200] NA
7 140 7 7 7 7 7 0.5 NA (100,200] NA
8 160 8 8 8 8 8 0.5 NA (100,200] NA
9 180 9 9 9 9 9 0.5 NA (100,200] NA
10 200 10 10 10 10 10 0.5 NA (100,200] NA
Edit:
For this special case:
MBA <- function(data) {
require(dplyr)
data <- data %>% mutate(PCAR = ((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP),
PCA_NCP[is.na(PCA_NCP)] <- 0,
CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP))),
CCP_Top_Down = CCP_Bottom_Up[1]- CCP_Bottom_Up)
data <- data %>%
group_by(CTIME = cut(cumsum(cumulative.time),
breaks = seq(0, max(cumsum(cumulative.time), na.rm = T), 100))) %>%
mutate(ROLLSUM = rev(cumsum(PCAR*time)))
groupMaxima <- data %>% slice_max(order_by = ROLLSUM, n = 1) %>%
pull(ROLLSUM)
data$ROLLSUM <- c(groupMaxima, rep(NA, nrow(data) - length(groupMaxima)))
data
}
Upvotes: 2
Reputation: 962
There are a number of ways, if your steps are really steps of 100 years, and the values go 0,20,40 in constant intervals- you can do this natively:
steps = 100
intervals = 20
ratio = steps / intervals
columns = df[,c("PCAR","time")]
indices = rep(ratio,nrow(df)) %>% cumsum
PCAR_BIN = lapply(indices,function(x){
localRange = (x-ratio):x
sum(columns[localRange,1] * columns[localRange,2])
})%>% unlist
we can now bind PICAR_BIN
:
df = cbind(df,PICAR_BIN)
Upvotes: 0