Reputation: 17
I am trying to select relevant rows from a large time-series data set. The tricky bit is, that the needed rows are before and after certain values in a column.
# example data
x <- rnorm(100)
y <- rep(0,100)
y[c(13,44,80)] <- 1
y[c(20,34,92)] <- 2
df <- data.frame(x,y)
In this case the critical values are 1 and 2 in the df$y
column. If, e.g., I want to select 2 rows before and 4 after df$y==1
I can do:
ones<-which(df$y==1)
selection <- NULL
for (i in ones) {
jj <- (i-2):(i+4)
selection <- c(selection,jj)
}
df$selection <- 0
df$selection[selection] <- 1
This, arguably, scales poorly for more values. For df$y==2
I would have to repeat with:
twos<-which(df$y==2)
selection <- NULL
for (i in twos) {
jj <- (i-2):(i+4)
selection <- c(selection,jj)
}
df$selection[selection] <- 2
Ideal scenario would be a function doing something similar to this imaginary function selector(data=df$y, values=c(1,2), before=2, after=5, afterafter = FALSE, beforebefore=FALSE)
, where values
is fed with the critical values, before
with the amount of rows to select before and correspondingly after
.
Whereas, afterafter
would allow for the possibility to go from certain rows until certain rows after the value, e.g. after=5,afterafter=10
(same but going into the other direction with afterafter
).
Any tips and suggestions are very welcome! Thanks!
Upvotes: 1
Views: 1084
Reputation: 38500
This is easy enough with rep
and its each argument.
df$y[rep(which(df$y == 2), each=7L) + -2:4] <- 2
Here, rep
repeats the row indices that your criterion 7 times each (two before, the value, and four after, the L indicates that the argument should be an integer). Add values -2 through 4 to get these indices. Now, replace.
Note that for some comparisons, ==
will not be adequate due to numerical precision. See the SO post why are these numbers not equal for a detailed discussion of this topic. In these cases, you could use something like
which(abs(df$y - 2) < 0.001)
or whatever precision measure will work for your problem.
Upvotes: 1