Delete rows with respect a time constraint

Question

I have the following dataset:

>dput(df)
 structure(list(Author = c("hitham", "Ow", "WPJ4", "Seb", "Karen", "Ow", "Ow", "hitham", "Sarah",
 "Rene"), diff = structure(c(28, 2, 8, 3, 7, 8, 11, 1, 4, 8), class = "difftime", units = "secs")), 
 row.names = 1:10, class = "data.frame")

As we can see, the author Ow appears three times and author hitham two times:

    Author    diff
 1  hitham 28 secs
 2      Ow  2 secs
 3    WPJ4  8 secs
 4     Seb  3 secs
 5   Karen  7 secs
 6      Ow  8 secs
 7      Ow 11 secs
 8  hitham  1 secs
 9   Sarah  4 secs
 10   Rene  8 secs

These rows represent some activities performed by the authors. For exampe, hitham performs its activity after 1sec and then after 18 secs in the second time.

I would like to make sure that there are at least 10 seconds between one activity and another.

I would like to delete those activities (lines) that do not meet this requirement. For example, Ow performs its activity after 2 secs and then after 8 secs: the latter should be deleted. The desired result is then:

    Author    diff
 1  hitham 28 secs
 2      Ow  2 secs
 3    WPJ4  8 secs
 4     Seb  3 secs
 5   Karen  7 secs
 6      Ow 11 secs
 7  hitham  1 secs
 8   Sarah  4 secs
 9    Rene  8 secs

Edit. I add this hoping to be clearer. Let us consider hitham. If we consider hitham rows (sorted by diff field):

   hitham  1 secs
   hitham 28 secs

We have that (28-1)+1>10, then there is no need to delete either of them.

Let us now consider Ow.

       Ow  2 secs
       Ow  8 secs
       Ow 11 secs

The differences in seconds between consecutive rows are (see last column):

       Ow  2 secs  -
       Ow  8 secs  7
       Ow 11 secs  4

The desired result can be obtained deleting the first row that show in the last column a number less than 10. In fact:

       Ow  2 secs  -
       Ow 11 secs  10

We don't have to delete the last line because the difference here is just 10.

Ben · Accepted Answer

Based on this answer you could try a recursive approach.

library(dplyr)

my_fun <- function(d, ind = 1) {
  ind.next <- first(which(d - d[ind] >= 9))
  if (length(ind.next) == 0)
    return(ind)
  else
    return(c(ind, my_fun(d, ind.next)))
}

df %>%
  group_by(Author) %>%
  arrange(diff) %>%
  slice(my_fun(diff))

Each time through in the function, it identifies the next index ind.next that is the first index for which the diff is greater or equal to 9 seconds from the diff indexed by ind. If there's no ind.next available it returns ind. Otherwise, recursively call the function again and concatenate with ind.

Output

  Author diff   
     
1 hitham  1 secs
2 hitham 28 secs
3 Karen   7 secs
4 Ow      2 secs
5 Ow     11 secs
6 Rene    8 secs
7 Sarah   4 secs
8 Seb     3 secs
9 WPJ4    8 secs

Delete rows with respect a time constraint

Answers (1)

Related Questions