select earliest date and latest date dependant on ID column in R

Question

Dataset

structure(list(x = c(1, 5, 2, 2, 4, 2, 5, 5, 4, 2, 1, 4, 3, 5, 
4, 1, 2, 3, 1, 3), y = structure(c(13520, 17333, 17422, 17096, 
17096, 18140, 11899, 11759, 17422, 15302, 12547, 17096, 17152, 
17096, 12547, 11423, 15302, 17422, 13867, 12547), class = "Date")), row.names = c(23L, 
87L, 55L, 38L, 40L, 115L, 27L, 135L, 53L, 122L, 11L, 48L, 61L, 
46L, 12L, 83L, 127L, 49L, 104L, 1L), class = "data.frame")

I want to find the latest date for 1-4 but for 5 I want the earliest date.

I can subset them by number and run two separate queries:

less_than_5 <- subset(df, x <5)
g <- setDT(less_than_5)[,.SD[which.max(y)]. keyby = x,]

And then the same for x == 5 and run which.min(y)

I was wondering if I could do the whole query in one line rather than subsetting into 1-4 and 5 as separate queries.

UPDATE:

If each row has a participant ID attached to it, some of which are repeated is there a way of doing this with the keyby feature. As in for each participant I would like to know the latest date one of 1:4 is mentioned. However, if it is a 5 then I want to know the earliest date.

Ronak Shah · Accepted Answer

You can use if/else

library(data.table)
setDT(df)[, if(first(x) != 5) max(y) else min(y), x]

#   x         V1
#1: 1 2007-12-20
#2: 5 2002-03-13
#3: 2 2019-09-01
#4: 4 2017-09-13
#5: 3 2017-09-13

select earliest date and latest date dependant on ID column in R

Answers (1)

Related Questions