Reputation: 2306
I have data frame like this:
Input = (" v1 v2
1 A1 0
2 B1 0
3 C1 0
4 D1 1
5 E1 0
6 F1 0
7 G1 0
8 H1 0
9 I1 0
10 J1 0
11 K1 0
12 A2 1
13 B2 0
14 C2 0
15 D2 0
16 E2 0
17 F2 0
18 G2 0
19 H2 0
20 I2 0
21 J2 0
22 K2 0
")
df = as.data.frame(read.table(textConnection(Input), header = T, row.names=1))
And I'd like to keep only rows with 1 in v2
and 3 previous and next rows around each 1, so desired output is:
v1 v2
A1 0
B1 0
C1 0
D1 1
E1 0
F1 0
G1 0
I1 0
J1 0
K1 0
A2 1
B2 0
C2 0
D2 0
So we have all 1-rows (in this case 2) and 6 corresponding neighbor rows (3 lower, 3 upper). In orginal dataset I have 100k+ rows and only several 1-rows spreaded in whole dataset.
I tried to do this with simple ifelse()
in apply
for prevs and next rows separately and then combine everything together but it doesn't work.
prev <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x-1:3,0)))
next <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x+1:3,0)))
I was thinking to use lag()
and lead()
but I don't know how to lag or lead n=3
rows only around with 1 in v2
. Could you please help me out?
Upvotes: 1
Views: 61
Reputation: 9247
One possible solution (maybe a bit lengthy but very interesting for other purposes as well) is to create multiple lags and leads of the variable of interest and then filter for any variable that has value equal to 1.
We first create two functions that produce n lags and n leads, respectively, starting from a dataframe:
lags <- function(data, variable, n){
require(dplyr)
require(purrr)
variable <- enquo(variable)
indices <- seq_len(n)
quosures <- map(indices, ~quo(lag(!!variable, !!.x))) %>%
set_names(sprintf("lag_%02d", indices))
mutate(data, !!!quosures)
}
leads <- function(data, variable, n){
require(dplyr)
require(purrr)
variable <- enquo(variable)
indices <- seq_len(n)
quosures <- map(indices, ~quo(lead(!!variable, !!.x))) %>%
set_names(sprintf("lead_%02d", indices))
mutate(data, !!!quosures)
}
Then we apply them to our dataframe and filter the observations that contains a 1:
library(dplyr)
df %>%
lags(v2, n = 3) %>%
leads(v2, n = 3) %>%
filter_all(any_vars(. == 1)) %>%
select(v1, v2)
# v1 v2
# 1 A1 0
# 2 B1 0
# 3 C1 0
# 4 D1 1
# 5 E1 0
# 6 F1 0
# 7 G1 0
# 8 I1 0
# 9 J1 0
# 10 K1 0
# 11 A2 1
# 12 B2 0
# 13 C2 0
# 14 D2 0
Upvotes: 3
Reputation: 388817
We can find out indices where v2 = 1
occurs and use sapply
to generate row numbers -3 to +3 of each index.
#get row index where v2 = 1
inds <- which(df$v2 == 1)
#unique to remove overlapping row index
inds2 <- unique(c(sapply(inds, `+`, -3:3)))
#remove negative values or values which are greater than number of rows in df
inds2 <- inds2[inds2 > 0 & inds2 <= nrow(df)]
#select rows.
df[inds2, ]
# v1 v2
#1 A1 0
#2 B1 0
#3 C1 0
#4 D1 1
#5 E1 0
#6 F1 0
#7 G1 0
#9 I1 0
#10 J1 0
#11 K1 0
#12 A2 1
#13 B2 0
#14 C2 0
#15 D2 0
Upvotes: 2