Reputation: 445
Let's say I have a matrix (or a vector) of the form
>set.seed(1)
>X=ifelse(matrix((runif(30)),ncol = 2)>0.4,0,1)
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 0 1
[4,] 0 0
[5,] 1 1
[6,] 0 0
[7,] 0 0
[8,] 0 0
[9,] 0 1
[10,] 1 0
[11,] 1 0
[12,] 1 0
[13,] 0 1
[14,] 1 0
[15,] 0 0
...
etc
How can I count number of consecutive zeros between ones in each column and replace zeros with 1 for these that have count less than predefined constant k. Or at the very least to get the start index and number of elements in each sequence of zeros. Generally there are much more zeros than ones in this data set, and most of the time the length of a sequence is greater than k
So, for example, if k=1, then [4,2];[13,1] and [15,1] are going to be replaced by 1. If k=2 than in addition to [4,1];[13,1] and [15,1], zeros in [3,1],[4,1], [14,2], and [15,2] are going to be replaced by 1 as well in this example.
Of course, I can just run a loop and go through all the rows. I wonder if there is a package, or a neat vectorization trick that can do it.
Update:
desired output example for k=1
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 0 1
[4,] 0 1
[5,] 1 1
[6,] 0 0
[7,] 0 0
[8,] 0 0
[9,] 0 1
[10,] 1 0
[11,] 1 0
[12,] 1 0
[13,] 1 1
[14,] 1 0
[15,] 1 0
Desired output for k=2
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 0 0
[7,] 0 0
[8,] 0 0
[9,] 0 1
[10,] 1 0
[11,] 1 0
[12,] 1 0
[13,] 1 1
[14,] 1 1
[15,] 1 1
Upvotes: 2
Views: 499
Reputation: 66819
The run-length tool rle
works here:
fill_shortruns <- function(X,k=1,badval=0,newval=1){
apply(X,2,function(x){
r <- rle(x)
r$values[ r$lengths <= k & r$values == badval ] <- newval
inverse.rle(r)
})
}
# smaller example
set.seed(1)
X0=ifelse(matrix((runif(10)),ncol = 2)>0.4,0,1)
# [,1] [,2] [,3] [,4]
# [1,] 1 0 1 0
# [2,] 1 0 1 0
# [3,] 0 0 0 0
# [4,] 0 0 1 1
# [5,] 1 1 0 0
fill_shortruns(X0,2)
# [,1] [,2] [,3] [,4]
# [1,] 1 0 1 0
# [2,] 1 0 1 0
# [3,] 1 0 1 0
# [4,] 1 0 1 1
# [5,] 1 1 1 1
Upvotes: 4