J.VDW
J.VDW

Reputation: 101

Calculate the sum of subsequent rows for a subset of values

I posted a question earlier on that topic, but I think it was not clear enough. Sorry. So, this is the second try.

I have data on the amount of milk consumed (volume) at different times for different individuals.

individual <- c(rep("A", 7), rep("B", 6))
time <- c(0, 12, 20, 26, 32, 36, 50, 0, 10, 21, 24, 36, 60)
volume <- c(0.3, 0.2, 0.1, 0.4, 0.3, 0.1, 0.2, 0.2, 0.4, 0.4, 0.3, 0.2, 0.1)
df <- data.frame(individual, time, volume)

So, I want to know how much milk is consumed during 24 hours after a milk ingestion. For example, individual A at time 0 h (first line in df) drank 0.3 L of milk and then drank an additionnal 0.2 L at time 12 and 0.1 L at time 20, which gives a total of 0.6 L drank during the 24 hours period following a milk ingestion.

I want to calculate this for every line for each individual and the desired output would be:

res_volume <- c(0.6, 1.1, 0.9, 1.0, "NA", "NA", "NA", 1.3, 1.1, 0.9, 0.5, 0.3, "NA")
df2 <- data.frame(df, res_volume)

"NA"s are there because there is not enough data to cover 24 hours after the milk ingestion (the difference in time between the last line for that individual and the given lines is less than 24 hours).

Any idea how I could achieve this? Your answers are really appreciated.

Upvotes: 0

Views: 452

Answers (2)

cdeterman
cdeterman

Reputation: 19960

Does this function work for you? You can set the interval at whatever increment you like with the default at 24.

milk_iter_sum <- function(df, interval=24){
  res_volume <- vector()
  df_list <- split(df, f=individual)
  for(i in 1:length(df_list)){
    cur_df <- df_list[[i]]
    for(j in 1:(nrow(cur_df))){

      inner_cur_df <- cur_df[cur_df$time >= cur_df$time[j] & cur_df$time<=cur_df$time[j]+interval,]

      if(cur_df$time[nrow(cur_df)] - inner_cur_df$time[1] < interval){
        res_volume <- append(res_volume, NA)
      }else{
        res_volume <- append(res_volume, with(inner_cur_df, aggregate(volume, by = list(individual), sum))$x)  

      }
    }
  }
  return(cbind(df, res_volume))
}

milk_iter_sum(df)

   individual time volume res_volume
1           A    0    0.3        0.6
2           A   12    0.2        1.1
3           A   20    0.1        0.9
4           A   26    0.4        1.0
5           A   32    0.3         NA
6           A   36    0.1         NA
7           A   50    0.2         NA
8           B    0    0.2        1.3
9           B   10    0.4        1.1
10          B   21    0.4        0.9
11          B   24    0.3        0.5
12          B   36    0.2        0.3
13          B   60    0.1         NA

Upvotes: 1

Carl Witthoft
Carl Witthoft

Reputation: 21502

If I got your meaning, start by identifying the rows which follow a "long interval" :

therows<- which(df$interval>1)+1

Then

df[therows,c(1,2,4)]

should be your desired result

Upvotes: 0

Related Questions