Possible to replicate this in R?

Question

For this question, i have given just one CaseNo as an example. In my df, i have millions.

Essentially what i want if the submission time is same for case no, the first row should equal one and all the other rows should equal 0 for that submission time, then for the submission time that is different, it should be 2 and so on.

df

CaseNo<-c(1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096)
SubmissionTime<-c("2019-12-06  2:06:59 PM","2019-12-06  2:06:59 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-12  7:25:03 PM","2020-05-12  7:25:03 PM","2020-05-12  7:25:03 PM","2020-05-12  7:25:03 PM")

What i want:

CaseNo<-c(1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096,1834096)
SubmissionTime<-c("2019-12-06  2:06:59 PM","2019-12-06  2:06:59 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-08  2:18:54 PM","2020-05-12  7:25:03 PM","2020-05-12  7:25:03 PM","2020-05-12  7:25:03 PM","2020-05-12  7:25:03 PM")
Rank<-c(1,0,2,0,0,0,0,3,0,0,0)

For every new case, it should start from 1.

Also, i have already arranged the df such that it goes from the oldest submissiontime to the most recent

I was thinking a combination of dense_rank and/or rank using dplyr?

bouncyball · Accepted Answer

I wrote a function to perform this operation. This function does assume the data is already ordered, as you mentioned this in your original question. The na.omit bit is because you get an NA when operating on the first value in the vector.

Since the operations are wrapped in a function, it may be helpful to examine each line step-by-step outside the function.

make_rank <- function(x){
  # compare current value to previous value, take cumulative sum
  x_1 <- cumsum(c(1, na.omit(x != lag(x))))
  # compare again, replacing with 0 where values match
  c(1, na.omit(ifelse(x_1 != lag(x_1), x_1, 0)))
}

make_rank(SubmissionTime)
# [1] 1 0 2 0 0 0 0 3 0 0 0

Possible to replicate this in R?

Answers (2)

Related Questions