Reputation: 646
suppose I have the following data.frame
,
df <- data.frame(id=c("a","b","c","d","e","f"),
d0=c(1,1,0,1,1,0),
d1=c(0,0,0,0,1,1),
d2=c(0,0,1,1,1,1),
d3=c(1,1,0,1,1,1),
d4=c(1,0,1,0,0,1),
d5=c(1,1,1,1,1,1))
id d0 d1 d2 d3 d4 d5
1 a 1 0 0 1 1 1
2 b 1 0 0 1 0 1
3 c 0 0 1 0 1 1
4 d 1 0 1 1 0 1
5 e 1 1 1 1 0 1
6 f 0 1 1 1 1 1
How can I count the max number of zeros between two pairs of 1? For example
1 0 1 --> 1
1 0 0 1 --> 2
0 1 --> 0
1 0 1 0 1 --> 1
1 0 1 0 0 1 --> 2
So the final output would be:
id d0 d1 d2 d3 d4 d5 final
a 1 0 0 1 1 1 2
b 1 0 0 1 0 1 2
c 0 0 1 0 1 1 1
d 1 0 1 1 0 1 1
e 1 1 1 1 0 1 1
f 0 1 1 1 1 1 0
Can someone help with this issue? Thanks!
Upvotes: 3
Views: 225
Reputation: 47300
We can view our groups of zeros as by cumsums
on rows, except that when cumsum
is 0
the group is not valid as it didn't start with 1
.
We use tapply
to count zero values (i.e. sum FALSE
) by group and keep the max:
apply(df[-1],1,function(row) max(tapply(!row,replace(x <- cumsum(row),!x,NA),sum)))
# [1] 2 2 1 1 1 0
Here's a more detailed version :
cs <- apply(df[-1],1,cumsum)
cs[cs==0] <- NA
sapply(seq(nrow(df)),function(i) max(tapply(!df[i,-1],cs[,i],sum)))
Upvotes: 1
Reputation: 38500
Here is a method using apply
and rle
after converting your data.frame to a matrix (excluding the ID).
# convert data to matrix
myMat <- data.matrix(df[-1])
Now, get the counts. The first and last values are set to 0, since the goal is to get counts of 0s between 1s.
# get the counts
apply(myMat, 1,
function(x) {
# get run lengths of 0s and 1s
tmp <- rle(x)
# set first and last values to 0
tmp$lengths[c(1, length(tmp$lengths))] <- 0
# return maximum count of 0s
max(tmp$lengths[tmp$values==0])
})
This returns
[1] 2 2 1 1 1 0
Upvotes: 3
Reputation: 2216
I created an auxiliary function to find the maximum number of zeros between 2 ones.
count_zeros <- function(vec){
pos_ones <- which(vec == 1)
count_zero <-NULL
for(i in 1:(length(pos_ones)-1)){
count_zero <- c(count_zero,length(which(vec[pos_ones[i]:pos_ones[i+1]] == 0)))
}
return(max(count_zero))
}
it just loops between the number of ones found in a vector vec
, it counts the number of zeros in the vector and returns the maximum number. With this it is just easy to loop the whole data frame. This is an approach with sapply
sapply(1:nrow(df), function(x) count_zeros(df[x,-1]))
the result is:
[1] 2 2 1 1 1 0
that is what you expect
Upvotes: 4