Victor Kostyuk
Victor Kostyuk

Reputation: 661

"Look around" subsetting function

I have a data frame with clickstream data. I'm interested in what happened just before and just after certain events defined by a boolean expression involving multiple columns -- i.e., given a boolean expression, I would like to output a subset of the original data frame which includes 10 rows above and below each row satisfying the expression. Is there an elegant way of doing this, for example using dplyr?

Adding a reproducible example:

df <- data.frame(col1 = c(rep("a",20), rep("b",20)), col2 = c(1:20, 1:20))

look_around(df, col1 == "a" & col2 %in% c(17,20)) should produce df[7:30,]

Write the function look_around.

Upvotes: 0

Views: 53

Answers (1)

Weihuang Wong
Weihuang Wong

Reputation: 13108

This seems like a variation on subset, so I adapted the following from subset:

look_around <- function(data, condition, before=10, after=10) {
    # Set default values for `before` and `after` to 10
    e <- substitute(condition)
    r <- eval(e, data, parent.frame())
    rows <- unique(as.vector(sapply(which(r), function(x) {
        (x-before):(x+after)
    })))
    rows <- rows[rows > 0 & rows <= nrow(data)]
    data[rows,]
}

Output:

> df <- data.frame(col1 = c(rep("a",20), rep("b",20)), col2 = c(1:20, 1:20))
> look_around(df, col1 == "a" & col2 %in% c(17,20), before=10, after=10)
   col1 col2
7     a    7
8     a    8
9     a    9
<snip>
30    b   10

Upvotes: 2

Related Questions