Bonono
Bonono

Reputation: 847

Delete rows in dataframe that are all previous to value in row

Consider df1:

df <- data.frame(time = c(1,2,3,4,5,6,7,8,9,10), marker = c(NA,NA,NA,"stop",NA,NA,NA,"start",NA,NA), behaviour = c("Rest","Rest","Rest","Rest","Awake","Awake","Awake","Awake","Awake","Rest"))

   time marker behaviour
1     1   <NA>      Rest
2     2   <NA>      Rest
3     3   <NA>      Rest
4     4   stop      Rest
5     5   <NA>     Awake
6     6   <NA>     Awake
7     7   <NA>     Awake
8     8  start     Awake
9     9   <NA>     Awake
10   10   <NA>      Rest

I want to subset the data based on column markers and not include the data that is between the element "stop" and "start so that df looks like this:

time marker behaviour
   1   <NA>      Rest
   2   <NA>      Rest
   3   <NA>      Rest
   4   stop      Rest
   8   start     Awake
   9   <NA>     Awake
   10  <NA>      Rest

Upvotes: 2

Views: 56

Answers (3)

Mike H.
Mike H.

Reputation: 14370

A cumsum solution (I use data.table as well but you don't have to) which generalizes to multiple stop/start values would be:

library(data.table)
dt <- as.data.table(df)

dt[, drop := list(cumsum(marker=="stop" & !is.na(marker)) - 
                    cumsum(marker=="start" & !is.na(marker)))][drop==0 | marker == "stop"]

   #    time marker behaviour drop
   # 1:    1     NA      Rest    0
   # 2:    2     NA      Rest    0
   # 3:    3     NA      Rest    0
   # 4:    4   stop      Rest    1
   # 5:    8  start     Awake    0
   # 6:    9     NA     Awake    0
   # 7:   10     NA      Rest    0

Upvotes: 1

G.Arima
G.Arima

Reputation: 1171

df <- data.frame(time = c(1,2,3,4,5,6,7,8,9,10), marker = c("NA","NA","NA","stop","NA","NA","NA","start","NA","NA"), behaviour = c("Rest","Rest","Rest","Rest","Awake","Awake","Awake","Awake","Awake","Rest"))

df1 <- as.integer(row.names(df[df$marker=="stop",]))+1
df2 <- as.integer(row.names(df[df$marker=="start",]))-1
ans <- df[-(df1:df2),]

Upvotes: 1

akrun
akrun

Reputation: 887981

We can use a numeric index to subset the rows

i1 <- with(df, which(marker %in% c("stop", "start")))
df[-((i1[1]+1):(i1[2]-1)),]

If there are multiple `start', 'stop', then, we can do

grp <- with(df, c(0, head(cumsum(marker == "stop" & !is.na(marker)),-1)))
df[with(df, ave(marker == "start" & !is.na(marker),
             grp, FUN = function(x) !any(x)|cumsum(x)>0)),]
#   time marker behaviour
#1     1   <NA>      Rest
#2     2   <NA>      Rest
#3     3   <NA>      Rest
#4     4   stop      Rest
#8     8  start     Awake
#9     9   <NA>     Awake
#10   10   <NA>      Rest

Upvotes: 1

Related Questions