Piccinin1992
Piccinin1992

Reputation: 249

Can I replicate the below in dplyr?

If I have a dataset like the one mentioned below:

d<-structure(list(ReviewType= c("Review","Review","Review","Correction","Correction","Review","Review","Review","Review","Review","Correction","Correction","Deficiency","Correction","Correction", 
                                "Correction", "Deficiency", "Deficiency", "Correction","Deficiency"),
                  Submissiondate= c("2020-08-29 04:32:00","2020-08-28 04:31:00","2020-08-26 04:31:00","2020-08-25 04:31:00","2020-08-24 04:31:00","2020-08-23 04:31:00","2020-08-22 04:31:00","2020-08-21 04:31:00","2020-08-20 04:31:00","2020-08-19 04:31:00",
"2020-09-27 04:31:00","2020-09-27 03:52:59","2020-09-28 17:30:00","2020-09-29 14:01:00",
                                    "2020-09-05 03:00:00","2020-09-05 03:51:00", "2020-09-03 23:59:49",
                                    "2020-09-02 00:03:54","2020-09-01 00:04:48","2020-08-31 23:49:30" ),
                  CaseNo= c("124","123","125","121","121","125","123","123","123","123","123","123","123","125","123","123","123","124","123","121")), class = "data.frame", row.names = c(NA, -20L))


d<-d%>%arrange(CaseNo,Submissiondate)

If I want to get total count of cases that had an outstanding (latest ReviewType) review type of Correction by the end of each week (Saturday 23:59:59), how would I do it?

Below is the desired output. Able to get Week using

mutate(Week = paste(lubridate::floor_date(Submissiondate, unit = "weeks"), "-", lubridate::ceiling_date(Submissiondate, unit = "weeks")))
      Week                Count   (for help)
2020-08-19 - 2020-08-22     0
2020-08-22 - 2020-08-29     1  (Correction 2020-08-25 04:31:00    121)
2020-08-29 - 2020-09-05     2  (Correction 2020-09-05 03:51:00    123 Correction 2020-08-25 04:31:00    121)
2020-09-27 - 2020-09-29     3  (Correction 2020-09-29 14:01:00    125 Correction 2020-09-27 04:31:00    123 Correction  2020-08-25 04:31:00    121)

Upvotes: 0

Views: 63

Answers (1)

eduardokapp
eduardokapp

Reputation: 1751

Okay, first of all, your question is still a little confusing, so I'll try my best here.

To find out for every week and for every case number the cases where correction type is "Correction", we first need a quick way to find the weeks and find the latest datetimes for each week.

Using

d$unixtime <- as.numeric(as.POSIXct(d$Submissiondate))

d$week <- lubridate::week(as.POSIXct(d$Submissiondate))

gets us a column that labels the week and a column that tranlates datetime to epochtime, or unixtime. One of the advantages of using epochtime is that it is easier to figure out which points are ahead or behind of each other, time-wise of course. Say, the latest time point is found by using max(unixtime).

Now the next challenge is to filter by week and then perform the search. What I did was filter by correction type and then group by case number and summarize by max (latest) times.

results <- lapply(unique(d$week), function(x){
  d_filtered <- dplyr::filter(d, d$week == x & d$ReviewType == "Correction")
  latest_cases <- d_filtered %>% group_by(CaseNo) %>% summarize(max.unixtime = max(unixtime))
  week_results <- as.list(latest_cases)
  return(week_results)
})

To find all the cases that fit these criteria and to find the count of these cases by week, use this:

cases <- unlist(lapply(results, function(x){
  return(x$CaseNo)
}))

count_by_week <- unlist(lapply(results, function(x){
  return(length(x$CaseNo))
}))

Upvotes: 1

Related Questions