Reputation: 847
Consider df1
:
df <- data.frame(time = c(1,2,3,4,5,6,7,8,9,10), marker = c(NA,NA,NA,"stop",NA,NA,NA,"start",NA,NA), behaviour = c("Rest","Rest","Rest","Rest","Awake","Awake","Awake","Awake","Awake","Rest"))
time marker behaviour
1 1 <NA> Rest
2 2 <NA> Rest
3 3 <NA> Rest
4 4 stop Rest
5 5 <NA> Awake
6 6 <NA> Awake
7 7 <NA> Awake
8 8 start Awake
9 9 <NA> Awake
10 10 <NA> Rest
I want to subset the data based on column markers
and not include the data that is between the element "stop" and "start so that df
looks like this:
time marker behaviour
1 <NA> Rest
2 <NA> Rest
3 <NA> Rest
4 stop Rest
8 start Awake
9 <NA> Awake
10 <NA> Rest
Upvotes: 2
Views: 56
Reputation: 14370
A cumsum
solution (I use data.table
as well but you don't have to) which generalizes to multiple stop/start
values would be:
library(data.table)
dt <- as.data.table(df)
dt[, drop := list(cumsum(marker=="stop" & !is.na(marker)) -
cumsum(marker=="start" & !is.na(marker)))][drop==0 | marker == "stop"]
# time marker behaviour drop
# 1: 1 NA Rest 0
# 2: 2 NA Rest 0
# 3: 3 NA Rest 0
# 4: 4 stop Rest 1
# 5: 8 start Awake 0
# 6: 9 NA Awake 0
# 7: 10 NA Rest 0
Upvotes: 1
Reputation: 1171
df <- data.frame(time = c(1,2,3,4,5,6,7,8,9,10), marker = c("NA","NA","NA","stop","NA","NA","NA","start","NA","NA"), behaviour = c("Rest","Rest","Rest","Rest","Awake","Awake","Awake","Awake","Awake","Rest"))
df1 <- as.integer(row.names(df[df$marker=="stop",]))+1
df2 <- as.integer(row.names(df[df$marker=="start",]))-1
ans <- df[-(df1:df2),]
Upvotes: 1
Reputation: 887981
We can use a numeric index to subset the rows
i1 <- with(df, which(marker %in% c("stop", "start")))
df[-((i1[1]+1):(i1[2]-1)),]
If there are multiple `start', 'stop', then, we can do
grp <- with(df, c(0, head(cumsum(marker == "stop" & !is.na(marker)),-1)))
df[with(df, ave(marker == "start" & !is.na(marker),
grp, FUN = function(x) !any(x)|cumsum(x)>0)),]
# time marker behaviour
#1 1 <NA> Rest
#2 2 <NA> Rest
#3 3 <NA> Rest
#4 4 stop Rest
#8 8 start Awake
#9 9 <NA> Awake
#10 10 <NA> Rest
Upvotes: 1