Reputation: 21
I have a data frame like this
uuid item timestamp value
1 1 2020-07-15 1
1 1 2020-08-16 2
1 2 2020-07-15 2
1 2 2020-07-15 2
1 2 2020-08-16 3
2 1 2020-07-18 1
2 1 2020-07-18 2
2 1 2020-08-16 2
2 1 2020-08-17 3
2 2 2020-07-18 1
2 2 2020-08-11 2
2 2 2020-08-17 1
participants finished more than 2 times for each items, I want to select responses which are finished in earlist and latest date as the test and retest. The resulting data could be showed like this:
uuid item timestamp value
1 1 2020-07-15 1
1 1 2020-08-16 2
1 2 2020-07-15 2
1 2 2020-07-15 2 (deplicated row)
1 2 2020-08-16 3
2 1 2020-07-18 1
2 1 2020-07-18 2 (deplicated row)
2 1 2020-08-16 2 (deplicated row)
2 1 2020-08-17 3
2 2 2020-07-18 1
2 2 2020-08-11 2 (deplicated row)
2 2 2020-08-17 1
To reach my goal, how could I do in R?
Upvotes: 2
Views: 48
Reputation: 39858
One dplyr
option could be:
df %>%
mutate(timestamp = as.Date(timestamp, format = "%Y-%m-%d")) %>%
group_by(uuid, item) %>%
slice(c(which.min(timestamp), which.max(timestamp)))
uuid item timestamp value
<int> <int> <date> <int>
1 1 1 2020-07-15 1
2 1 1 2020-08-16 2
3 1 2 2020-07-15 2
4 1 2 2020-08-16 3
5 2 1 2020-07-18 1
6 2 1 2020-08-17 3
7 2 2 2020-07-18 1
8 2 2 2020-08-17 1
Upvotes: 3