Maximilian
Maximilian

Reputation: 4229

Label start of a sequence with NA's

This is trivial question, however I don't seem to find neat solution for this. (without excluding NA's first and including them back again). So I'm looking for some ideas without the need of NA's exclusion.

I would like to label the start of a 0 and 1 sequence with 2 and 1 respectively and replace NA's with 0 as well as the remaining sequence of 0's and 1's.

Is the rle function useful here? Base R solution would be welcomed.

Example:

x <- c(rep(NA,10),rep(1,5),rep(NA,5),rep(1,10),rep(NA,3),rep(0,7),rep(NA,15),rep(1,9))
r <- c(0,diff(x)); r[r %in% -1] <- 2

From this sample data:

x
[1] NA NA NA NA NA NA NA NA NA NA  1  1  1  1  1 NA NA NA NA NA  1  1  1  1  1  1  1  1  1  1 NA NA NA  0  0  0  0  0  0  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1  1  1  1  1  1  1  1  1

Desired output:

[1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Upvotes: 1

Views: 49

Answers (1)

akrun
akrun

Reputation: 887213

We could use rle to create a grouping variable ('gr') to split the 'x' into a list. Replace the first element that is 0 or 1 with 2 or 1 using match, concatenate with 0s, unlist and then replace the NA elements with 0.

xN <- x
xN[is.na(xN)] <- -999
v1 <- rle(xN)$lengths
gr <- rep(seq_along(v1), v1)


x1 <- unlist(lapply(split(x, gr), function(x) 
          c(match(x[1],1:0),rep(0,length(x)-1)) ), use.names=FALSE)
x1[is.na(x1)] <- 0
x1
#[1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
#[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Or instead of split, we can use which and diff to replace the values.

x1 <- (!x)+2*(!is.na(x))-1
ind <- which(!is.na(x))
x1[c(ind[c(FALSE,diff(ind)==1)], which(is.na(x)))] <- 0
x1
#[1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
#[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Or we can use rleid from the devel version of data.table as grouping variable. Replace the first element of 0's and 1's with 2 and 1 using match and change the NA values to 0.

library(data.table)#v1.9.5+
DT <- setDT(list(x))
DT[, c(match(V1[1], 1:0), rep(0, .N-1)) ,rleid(V1)][is.na(V1), V1:=0]$V1
#[1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
#[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Upvotes: 1

Related Questions