Reputation: 1769
I have the following dataset:
>dput(df)
structure(list(Author = c("hitham", "Ow", "WPJ4", "Seb", "Karen", "Ow", "Ow", "hitham", "Sarah",
"Rene"), diff = structure(c(28, 2, 8, 3, 7, 8, 11, 1, 4, 8), class = "difftime", units = "secs")),
row.names = 1:10, class = "data.frame")
As we can see, the author Ow
appears three times and author hitham
two times:
Author diff
1 hitham 28 secs
2 Ow 2 secs
3 WPJ4 8 secs
4 Seb 3 secs
5 Karen 7 secs
6 Ow 8 secs
7 Ow 11 secs
8 hitham 1 secs
9 Sarah 4 secs
10 Rene 8 secs
These rows represent some activities performed by the authors. For exampe, hitham
performs its activity after 1sec and then after 18 secs in the second time.
I would like to make sure that there are at least 10 seconds between one activity and another.
I would like to delete those activities (lines) that do not meet this requirement. For example, Ow
performs its activity after 2 secs and then after 8 secs: the latter should be deleted. The desired result is then:
Author diff
1 hitham 28 secs
2 Ow 2 secs
3 WPJ4 8 secs
4 Seb 3 secs
5 Karen 7 secs
6 Ow 11 secs
7 hitham 1 secs
8 Sarah 4 secs
9 Rene 8 secs
Edit. I add this hoping to be clearer. Let us consider hitham
. If we consider hitham
rows (sorted by diff
field):
hitham 1 secs
hitham 28 secs
We have that (28-1)+1>10
, then there is no need to delete either of them.
Let us now consider Ow
.
Ow 2 secs
Ow 8 secs
Ow 11 secs
The differences in seconds between consecutive rows are (see last column):
Ow 2 secs -
Ow 8 secs 7
Ow 11 secs 4
The desired result can be obtained deleting the first row that show in the last column a number less than 10. In fact:
Ow 2 secs -
Ow 11 secs 10
We don't have to delete the last line because the difference here is just 10.
Upvotes: 1
Views: 51
Reputation: 30494
Based on this answer you could try a recursive approach.
library(dplyr)
my_fun <- function(d, ind = 1) {
ind.next <- first(which(d - d[ind] >= 9))
if (length(ind.next) == 0)
return(ind)
else
return(c(ind, my_fun(d, ind.next)))
}
df %>%
group_by(Author) %>%
arrange(diff) %>%
slice(my_fun(diff))
Each time through in the function, it identifies the next index ind.next
that is the first index for which the diff
is greater or equal to 9 seconds from the diff
indexed by ind
. If there's no ind.next
available it returns ind
. Otherwise, recursively call the function again and concatenate with ind
.
Output
Author diff
<chr> <drtn>
1 hitham 1 secs
2 hitham 28 secs
3 Karen 7 secs
4 Ow 2 secs
5 Ow 11 secs
6 Rene 8 secs
7 Sarah 4 secs
8 Seb 3 secs
9 WPJ4 8 secs
Upvotes: 1