lolo
lolo

Reputation: 646

R sum over rows - Data.frame

suppose I have the following data.frame,

df <- data.frame(id=c("a","b","c","d","e","f"),
                 d0=c(1,1,0,1,1,0),
                 d1=c(0,0,0,0,1,1),
                 d2=c(0,0,1,1,1,1),
                 d3=c(1,1,0,1,1,1),
                 d4=c(1,0,1,0,0,1),
                 d5=c(1,1,1,1,1,1))

  id d0 d1 d2 d3 d4 d5
1  a  1  0  0  1  1  1
2  b  1  0  0  1  0  1
3  c  0  0  1  0  1  1
4  d  1  0  1  1  0  1
5  e  1  1  1  1  0  1
6  f  0  1  1  1  1  1

How can I count the max number of zeros between two pairs of 1? For example

1 0 1 --> 1
1 0 0 1 --> 2
0 1 --> 0
1 0 1 0 1 --> 1
1 0 1 0 0 1 --> 2

So the final output would be:

  id d0 d1 d2 d3 d4 d5 final
  a  1  0  0  1  1  1     2
  b  1  0  0  1  0  1     2
  c  0  0  1  0  1  1     1
  d  1  0  1  1  0  1     1
  e  1  1  1  1  0  1     1
  f  0  1  1  1  1  1     0

Can someone help with this issue? Thanks!

Upvotes: 3

Views: 225

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47300

We can view our groups of zeros as by cumsums on rows, except that when cumsum is 0 the group is not valid as it didn't start with 1.

We use tapply to count zero values (i.e. sum FALSE) by group and keep the max:

apply(df[-1],1,function(row) max(tapply(!row,replace(x <- cumsum(row),!x,NA),sum)))
# [1] 2 2 1 1 1 0

Here's a more detailed version :

cs <- apply(df[-1],1,cumsum)
cs[cs==0] <- NA
sapply(seq(nrow(df)),function(i) max(tapply(!df[i,-1],cs[,i],sum)))

Upvotes: 1

lmo
lmo

Reputation: 38500

Here is a method using apply and rle after converting your data.frame to a matrix (excluding the ID).

# convert data to matrix
myMat <- data.matrix(df[-1])

Now, get the counts. The first and last values are set to 0, since the goal is to get counts of 0s between 1s.

# get the counts
apply(myMat, 1,
      function(x) {
        # get run lengths of 0s and 1s
        tmp <- rle(x)
        # set first and last values to 0
        tmp$lengths[c(1, length(tmp$lengths))] <- 0
        # return maximum count of 0s
        max(tmp$lengths[tmp$values==0])
})

This returns

[1] 2 2 1 1 1 0

Upvotes: 3

Alejandro Andrade
Alejandro Andrade

Reputation: 2216

I created an auxiliary function to find the maximum number of zeros between 2 ones.

count_zeros <- function(vec){
  pos_ones <- which(vec == 1)
  count_zero <-NULL
  for(i in 1:(length(pos_ones)-1)){
    count_zero <- c(count_zero,length(which(vec[pos_ones[i]:pos_ones[i+1]] == 0)))
  }
  return(max(count_zero))
}

it just loops between the number of ones found in a vector vec, it counts the number of zeros in the vector and returns the maximum number. With this it is just easy to loop the whole data frame. This is an approach with sapply

sapply(1:nrow(df), function(x) count_zeros(df[x,-1]))

the result is:

[1] 2 2 1 1 1 0

that is what you expect

Upvotes: 4

Related Questions