221B
221B

Reputation: 851

Count of consecutive zeros in a dataframe

Below is my dataframe. It has row names and column names.

       1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
   row1 0 0 0 1 0 0 1 0 0  0  0  0  0  0  0
   row2 0 0 0 1 1 1 1 1 1  1  1  1  1  1  0 

I would like to derive a column test based on consecutive zeros (from the last columns, across the columns for each row. Below is an example. For the first row, there are 8 consecutive zeros, so the value in the test row should be 8. for the second row, the result should be 1 as only one zero. ( I want to consider from 15 and go back till where zeros are started).

       1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 test
   row1 0 0 0 1 0 0 1 0 0  0  0  0  0  0  0  8
   row2 0 0 0 1 1 1 1 1 1  1  1  1  1  1  0  1

What's the best way to achieve this?

Upvotes: 0

Views: 2519

Answers (2)

Mike H.
Mike H.

Reputation: 14360

You could simply find the index of the first value that does not equal 0(starting from the last column) and then subtract one:

df$test2 <- apply(df[,ncol(df):1]==0, 1, which.min) - 1

df
#  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 test2
#1 0 0 0 1 0 0 1 0 0  0  0  0  0  0  0     8
#2 0 0 0 1 1 1 1 1 1  1  1  1  1  1  0     1

Another answer:

Since I was curious about a way to do this without apply-ing over rows I came up with a (admittedly complicated) Reduce solution. Not a solution I recommend, but one I was interested to see if there was a way to do it:

iniCol <- setNames(df[,ncol(df)] == 0, as.numeric(df[,ncol(df)] == 0))
df$test2 <- Reduce(function(ini, add) {temp <- ifelse(pmin(as.numeric(names(ini)), add==0) == 0, ini, rowSums(cbind(ini, add == 0)))
                                       ini  <- setNames(temp, pmin(as.numeric(names(ini)), add==0))}, 
                   df[,(ncol(df)-1):1], 
                   ini = iniCol)

The idea behind this is to use the names attribute to track whether or not a column was ever 0. If it was then we stop counting, otherwise continue counting.

Upvotes: 1

pogibas
pogibas

Reputation: 28339

Solution using rle:

getConsecZeroRle <- function(x) {
    foo <- rle(x)
    foo$lengths[tail(which(foo$values), 1)]
}
result <- apply(df[, -1] == 0, 1, function(x) getConsecZeroRle(x))
df$test <- as.numeric(result)
df$test[is.na(df$test)] <- 0

Explanation:

Use apply to iterate over the subset of your dataframe. For each row calculate length of consecutive zeros (rle) and extract last value using tail. Rows that don't have zeros will produce NA (using is.na(df$test)) to replace them with zeros.


Solution using sum:

getConsecZeroSum <- function(x) {
    x[1:tail(which(!x), 1)] <- FALSE
    sum(x)
}
df$test <- apply(df[, -1] == 0, 1, function(x) getConsecZeroSum(x))

Explanation:

Extract last FALSE value in each row and turn everything to FALSE before it (x[1:tail(which(!x), 1)] <- FALSE) then use sum to count zero values from the end.

Result:

#      a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 test
# 1 row1 0 0 0 1 0 0 1 0 0  0  0  0  0  0  0    8
# 2 row2 0 0 0 1 1 1 1 1 1  1  1  1  1  1  0    1

Upvotes: 4

Related Questions