Stataq
Stataq

Reputation: 2301

how to do conditional calculation with strange requirement

I have strange problem with calculation, and I am not sure what I should do. I have a data that looks like this: enter image description here

and I need to sort by ID and Date at first,which I did. Then i need to find the baseline date, only if duration for that date is <=0 and closest to 0, that one can be used as baseline, then I need to calculate usable=current score/baseline date score. so the final results should look like this:

enter image description here

What should I do? How can I check the oldest day and build "usable" to use the score/oldest score?

The codes for sample data are:

ID <-c("1","1","1","1","2","2","2","2")
Date<- c("4/19/2018","7/27/2018","8/24/2018","9/21/2018","10/19/2018","12/14/2018","1/11/2019","1/24/2019")
Duration <- c("-13","-7","95","142","2","36","75","81")
score <- c("0.06","0.071","0.054","0.0258","0.0208","0.0448","0.0638","0.0227")

Sample.data <- data.frame(ID, Date, Duration, score)

Upvotes: 2

Views: 55

Answers (1)

akrun
akrun

Reputation: 886938

The columns in 'Sample.data' are all character class as the values were quoted (used R 4.0.0. If it was < R 4.0, stringsAsFactors = TRUE by default), so we used type.convert to change the class based on the values automatically, then before we do the arrange on 'ID', 'Date', convert the 'Date' to Date class (in case there are some inconsistency in the original data with respect to the order), after grouping by 'ID', create the new column 'Useable' with an if/else condition to return the standardized 'score' with the first value of 'score' or else return NA

library(dplyr)
library(lubridate)
Sample.data <- Sample.data %>% 
     type.convert(as.is = TRUE) %>%
     mutate(Date = mdy(Date)) %>% 
     arrange(ID, Date) %>% 
     group_by(ID) %>% 
     mutate(Useable = if(first(Duration) <=0) c(NA, score[-1]/first(score))
                 else NA_real_)
Sample.data
# A tibble: 8 x 5
# Groups:   ID [2]
#     ID Date       Duration  score Useable
#  <int> <date>        <int>  <dbl>   <dbl>
#1     1 2018-04-19      -13 0.06     NA   
#2     1 2018-07-27       86 0.071     1.18
#3     1 2018-08-24       95 0.054     0.9 
#4     1 2018-09-21      142 0.0258    0.43
#5     2 2018-10-19        2 0.0208   NA   
#6     2 2018-12-14       36 0.0448   NA   
#7     2 2019-01-11       75 0.0638   NA   
#8     2 2019-01-24       81 0.0227   NA   

Upvotes: 2

Related Questions