Reputation: 530
Dataset
structure(list(x = c(1, 5, 2, 2, 4, 2, 5, 5, 4, 2, 1, 4, 3, 5,
4, 1, 2, 3, 1, 3), y = structure(c(13520, 17333, 17422, 17096,
17096, 18140, 11899, 11759, 17422, 15302, 12547, 17096, 17152,
17096, 12547, 11423, 15302, 17422, 13867, 12547), class = "Date")), row.names = c(23L,
87L, 55L, 38L, 40L, 115L, 27L, 135L, 53L, 122L, 11L, 48L, 61L,
46L, 12L, 83L, 127L, 49L, 104L, 1L), class = "data.frame")
I want to find the latest date for 1-4 but for 5 I want the earliest date.
I can subset them by number and run two separate queries:
less_than_5 <- subset(df, x <5)
g <- setDT(less_than_5)[,.SD[which.max(y)]. keyby = x,]
And then the same for x == 5
and run which.min(y)
I was wondering if I could do the whole query in one line rather than subsetting into 1-4 and 5 as separate queries.
UPDATE:
If each row has a participant ID attached to it, some of which are repeated is there a way of doing this with the keyby feature. As in for each participant I would like to know the latest date one of 1:4 is mentioned. However, if it is a 5 then I want to know the earliest date.
Upvotes: 1
Views: 92
Reputation: 388907
You can use if
/else
library(data.table)
setDT(df)[, if(first(x) != 5) max(y) else min(y), x]
# x V1
#1: 1 2007-12-20
#2: 5 2002-03-13
#3: 2 2019-09-01
#4: 4 2017-09-13
#5: 3 2017-09-13
Upvotes: 1