Jojo
Jojo

Reputation: 5211

Assign a binary vector based on blocks of data within another vector

I have a data frame:

dat <- data.frame(diffsecs=(c(189, 245, 13988, 2396, 29601, 263, 297, 292, 230, 257, 192, 
    286, 236, 261, 286, 268, 294, 260, 286, 299, 514, 2287, 234, 
    195, 250, 519, 560, 3314, 12340, 186, 184, 180, 180, 180, 180, 
    180, 180, 180, 180, 180, 3072, 180, 180, 206, 180, 180, 180, 
    360, 180, 180, 180, 180, 5220, 180, 437, 246, 218, 212, 472, 
    2356, 2641, 363, 425, 757, 403, 181, 355, 192, 192, 784, 238, 
    250, 261, 272, 2554, 29524, 4482, 6762, 1252, 269, 303, 294, 
    286, 273, 289, 274, 216, 255, 180, 252, 322, 238, 583, 289, 317, 
    308, 305, 308, 312, 330)))

It has blocks of instances where there are multiple, consecutive rows equaling 180. I want to assign a binary vector which equals 1 when the value of diffsecs equals 180 and 0 otherwise. However, I only want it to equal 1 when in a block of 5 or more consecutive instances of 180. So if there is 3 consecutive values of 180 the binary vector will equal 0.

I tried using the loop

total<- nrow(dat)
len<- 1:total

for(i in len){
  temp<- dat[i:(i+5),] 
  xdiff<- ifelse(mean(temp$diffsecs)>178 & mean(temp$diffsecs)<182 ,1,0)
  temp2<- cbind(dat[i,],xdiff)
  if(i==1) {dat2 <- temp2}
  else {dat2<- rbind(dat2,temp2)}

}

But it doesn't manage it and assigns shorter blocks than required.

Upvotes: 2

Views: 190

Answers (2)

digEmAll
digEmAll

Reputation: 57220

You can take advantage of the great rle function and it's inverse counterpart :

RLE <- rle(dat$diffsecs)
RLE$values <- ifelse(RLE$values == 180 & RLE$lengths >= 5,1,0)
dat2 <- cbind(dat,binarycol=inverse.rle(RLE))

As correctly pointed out by @Frank, you can shorten the second line to :

RLE$values <- as.integer(RLE$values == 180 & RLE$lengths >= 5)

or even :

RLE$values <- RLE$values == 180 & RLE$lengths >= 5

if a vector of FALSE/TRUE is ok for you instead of 0/1

Upvotes: 4

Frank
Frank

Reputation: 66819

With data.table, you can use rleid:

library(data.table)
setDT(dat)

dat[, v := 
  (diffsecs==180)*(.N >= 5)
, by = rleid(diffsecs == 180)][]

Upvotes: 2

Related Questions