Reputation: 661
I have a data frame with clickstream data. I'm interested in what happened just before and just after certain events defined by a boolean expression involving multiple columns -- i.e., given a boolean expression, I would like to output a subset of the original data frame which includes 10 rows above and below each row satisfying the expression. Is there an elegant way of doing this, for example using dplyr?
Adding a reproducible example:
df <- data.frame(col1 = c(rep("a",20), rep("b",20)), col2 = c(1:20, 1:20))
look_around(df, col1 == "a" & col2 %in% c(17,20))
should produce df[7:30,]
Write the function look_around.
Upvotes: 0
Views: 53
Reputation: 13108
This seems like a variation on subset
, so I adapted the following from subset
:
look_around <- function(data, condition, before=10, after=10) {
# Set default values for `before` and `after` to 10
e <- substitute(condition)
r <- eval(e, data, parent.frame())
rows <- unique(as.vector(sapply(which(r), function(x) {
(x-before):(x+after)
})))
rows <- rows[rows > 0 & rows <= nrow(data)]
data[rows,]
}
Output:
> df <- data.frame(col1 = c(rep("a",20), rep("b",20)), col2 = c(1:20, 1:20))
> look_around(df, col1 == "a" & col2 %in% c(17,20), before=10, after=10)
col1 col2
7 a 7
8 a 8
9 a 9
<snip>
30 b 10
Upvotes: 2