Reputation: 65
I have the following data frame:
data <- data.frame("Group" = c(1,1,1,1,1,1,1,1,2,2,2,2),
"Days" = c(1,2,3,4,5,6,7,8,1,2,3,4), "Num" = c(10,12,23,30,34,40,50,60,2,4,8,12))
I need to take the last value in Num and divide it by all of the preceding values. Then, I need to move to the second to the last value in Num and do the same, until I reach the first value in each group.
Edited based on the comments below: In plain language and showing all the math, starting with the first group as suggested below, I am trying to achieve the following: Take 60 (last value in group 1) and:
Day Num Res
7 60/50 1.2
6 60/40 1.5
5 60/34 1.76
4 60/30 2
3 60/23 2.60
2 60/12 5
1 60/10 6
Then keep only the row that has the value 2, as I don't care about the others (I want the value that is greater or equal to 2 that is the closest to 2) and return the day of that value, which is 4, as well. Then, move on to 50 and do the following:
Day Num Res
6 50/40 1.25
5 50/34 1.47
4 50/30 1.67
3 50/23 2.17
2 50/12 4.17
1 50/10 5
Then keep only the row that has the value 2.17 and return the day of that value, which is 3, as well. Then, move on to 40 and do the same thing over again, move on to 34, then 30, then 23, then 12, the last value (or Day 1 value) I don't care about. Then move on to the next group's last value (12) and repeat the same approach for that group (12/8, 12/4, 12/2; 8/4, 8/2; 4/2)
I would like to store the results of these divisions but only the most recent result that is greater than or equal to 2. I would also like to return the day that result was achieved. Basically, I am trying to calculate doubling time for each day. I would also need this to be grouped by the Group. Normally, I would use dplyr for this but I am not sure how to link up a loop with dyplr to take advantage of group_by. Also, I could be overlooking lapply or some variation thereof. My expected dataframe with the results would ideally be this:
data2 <- data.frame(divres = c(NA,NA,2.3,2.5,2.833333333,3.333333333,2.173913043,2,NA,2,2,3),
obs_n =c(NA,NA,1,2,2,2,3,4,NA,1,2,2))
data3 <- bind_cols(data, data2)
I have tried this first loop to calculate the division but I am lost as to how to move on to the next last value within each group. Right now, this is ignoring the group, though I obviously have not told it to group as I am unclear as to how to do this outside of dplyr.
for(i in 1:nrow(data))
data$test[i] <- ifelse(!is.na(data$Num), last(data$Num)/data$Num[i] , NA)
I also get the following error when I run it:
number of items to replace is not a multiple of replacement length
To store the division, I have tried this:
division <- function(x){
if(x>=2){
return(x)
} else {
return(FALSE)
}
}
for (i in 1:nrow(data)){
data$test[i]<- division(data$test[i])
}
Now, this approach works but only if i need to run this once on the last observation and only if I apply it to 1 group. I have 209 groups and many days that I would need to run this over. I am not sure how to put together the first for loop with the division function and I also am totally lost as to how to do this by group and move to the next last values. Any suggestions would be appreciated.
Upvotes: 0
Views: 707
Reputation: 462
You can modify your division
function to handle vector and return a dataframe with two columns divres
and ind
the latter is the row index that will be used to calculate obs_n
as shown below:
division <- function(x){
lenx <- length(x)
y <- vector(mode="numeric", length = lenx)
z <- vector(mode="numeric", length = lenx)
for (i in lenx:1){
y[i] <- ifelse(length(which(x[i]/x[1:i]>=2))==0,NA,x[i]/x[1:i] [max(which(x[i]/x[1:i]>=2))])
z[i] <- ifelse(is.na(y[i]),NA,max(which(x[i]/x[1:i]>=2)))
}
df <- data.frame(divres = y, ind = z)
return(df)
}
Check the output of
division
function created above usingdata$Num
as input
> division(data$Num)
divres ind
1 NA NA
2 NA NA
3 2.300000 1
4 2.500000 2
5 2.833333 2
6 3.333333 2
7 2.173913 3
8 2.000000 4
9 NA NA
10 2.000000 9
11 2.000000 10
12 3.000000 10
Use
cbind
to combine the above output with dataframedata1
, usepipes
andmutate
fromdplyr
to lookup theobs_n
value inDay
usingind
, select appropriate columns to generate the desired dataframedata2
:
data2 <- cbind.data.frame(data, division(data$Num)) %>% mutate(obs_n = Days[ind]) %>% select(-ind)
Output
> data2
Group Days Num divres obs_n
1 1 1 10 NA NA
2 1 2 12 NA NA
3 1 3 23 2.300000 1
4 1 4 30 2.500000 2
5 1 5 34 2.833333 2
6 1 6 40 3.333333 2
7 1 7 50 2.173913 3
8 1 8 60 2.000000 4
9 2 1 2 NA NA
10 2 2 4 2.000000 1
11 2 3 8 2.000000 2
12 2 4 12 3.000000 2
Upvotes: 1
Reputation: 18683
You can create a function with a for loop to get the desired day as given below. Then use that to get the divres in a dplyr mutation.
obs_n <- function(x, days) {
lst <- list()
for(i in length(x):1){
obs <- days[which(rev(x[i]/x[(i-1):1]) >= 2)]
if(length(obs)==0)
lst[[i]] <- NA
else
lst[[i]] <- max(obs)
}
unlist(lst)
}
Then use dense_rank
to obtain the row number corresponding to each obs_n
. This is needed in case the days are not consecutive, i.e. have gaps.
library(dplyr)
data %>%
group_by(Group) %>%
mutate(obs_n=obs_n(Num, Days), divres=Num/Num[dense_rank(obs_n)])
# A tibble: 12 x 5
# Groups: Group [2]
Group Days Num obs_n divres
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 10 NA NA
2 1 2 12 NA NA
3 1 3 23 1 2.3
4 1 4 30 2 2.5
5 1 5 34 2 2.83
6 1 6 40 2 3.33
7 1 7 50 3 2.17
8 1 8 60 4 2
9 2 1 2 NA NA
10 2 2 4 1 2
11 2 3 8 2 2
12 2 4 12 2 3
Explanation of dense ranks (from Wikipedia). In dense ranking, items that compare equally receive the same ranking number, and the next item(s) receive the immediately following ranking number.
x <- c(NA, NA, 1,2,2,4,6)
dplyr::dense_rank(x)
# [1] NA, NA, 1 2 2 3 4
Compare with rank (default method="average"). Note that NAs are included at the end by default.
rank(x)
[1] 6.0 7.0 1.0 2.5 2.5 4.0 5.0
Upvotes: 0