Qinglan
Qinglan

Reputation: 21

Find duplicated rows based on 2 columns and selecte the first and last rows in Data Frame

I have a data frame like this

uuid  item   timestamp     value
  1    1     2020-07-15      1
  1    1     2020-08-16      2
  1    2     2020-07-15      2
  1    2     2020-07-15      2
  1    2     2020-08-16      3
  2    1     2020-07-18      1
  2    1     2020-07-18      2
  2    1     2020-08-16      2
  2    1     2020-08-17      3
  2    2     2020-07-18      1
  2    2     2020-08-11      2
  2    2     2020-08-17      1

participants finished more than 2 times for each items, I want to select responses which are finished in earlist and latest date as the test and retest. The resulting data could be showed like this:

uuid  item   timestamp     value
  1    1     2020-07-15      1
  1    1     2020-08-16      2   
  1    2     2020-07-15      2
  1    2     2020-07-15      2   (deplicated row)
  1    2     2020-08-16      3
  2    1     2020-07-18      1
  2    1     2020-07-18      2   (deplicated row)
  2    1     2020-08-16      2   (deplicated row)
  2    1     2020-08-17      3
  2    2     2020-07-18      1
  2    2     2020-08-11      2   (deplicated row)
  2    2     2020-08-17      1

To reach my goal, how could I do in R?

Upvotes: 2

Views: 48

Answers (1)

tmfmnk
tmfmnk

Reputation: 39858

One dplyr option could be:

df %>%
 mutate(timestamp = as.Date(timestamp, format = "%Y-%m-%d")) %>%
 group_by(uuid, item) %>%
 slice(c(which.min(timestamp), which.max(timestamp)))

   uuid  item timestamp  value
  <int> <int> <date>     <int>
1     1     1 2020-07-15     1
2     1     1 2020-08-16     2
3     1     2 2020-07-15     2
4     1     2 2020-08-16     3
5     2     1 2020-07-18     1
6     2     1 2020-08-17     3
7     2     2 2020-07-18     1
8     2     2 2020-08-17     1

Upvotes: 3

Related Questions