piptoma
piptoma

Reputation: 796

Count sequences of numbers rowwise

I have the following dataframe with 0, 1, and NAs for IDs A to E over a one year period:

dat <- data.frame(
 id = c("A", "B", "C", "D", "E"),
 jan = c(0, 0, NA, 1, 0),
 feb = c(0, 1, 1, 0, 0),
 mar = c(0, 0, 1, 0, 1),
 apr = c(0, NA, 0, NA, 1),
 may = c(0, NA, 0, 0, 0),
 jun = c(0, 0, 0, 0, 0),
 jul = c(0, 0, 0, 0, 1),
 aug = c(NA, 0, 0, 1, 1),
 sep = c(NA, 0, 0, 1, NA),
 okt = c(NA, 0, 0, 0, NA),
 nov = c(NA, 0, 0, 0, 1),
 dez = c(NA, 0, 0, 0, 0)
)

> dat
  id jan feb mar apr may jun jul aug sep okt nov dez
   A   0   0   0   0   0   0   0  NA  NA  NA  NA  NA
   B   0   1   0  NA  NA   0   0   0   0   0   0   0
   C  NA   1   1   0   0   0   0   0   0   0   0   0
   D   1   0   0  NA   0   0   0   1   1   0   0   0
   E   0   0   1   1   0   0   1   1  NA  NA   1   0

I would like to count the number of 1s for each ID over this one year period, but the following conditions need to be met:

In my example, the count would be:

> dat
   id jan feb mar apr may jun jul aug sep okt nov dez     count
 1  A   0   0   0   0   0   0   0  NA  NA  NA  NA  NA      => 0
 2  B   0   1   0  NA  NA   0   0   0   0   0   0   0      => 1
 3  C  NA   1   1   0   0   0   0   0   0   0   0   0      => 1
 4  D   1   0   0  NA   0   0   0   1   1   0   0   0      => 2
 5  E   0   0   1   1   0   0   1   1  NA  NA   1   0      => 1

The function should be applied rowwise in the form of apply(dat[, -1], 1, my_fun) and return a vector containing the count (i.e. 0, 1, 1, 2, 1). Does anybody have an idea how to achieve this?

Upvotes: 4

Views: 134

Answers (4)

Sotos
Sotos

Reputation: 51582

Since your data set is for months, then with 12 months you can only have one pattern where 1 will count as a second 1, so the maximum number of 1s you will ever have is two. In this case then you don't need any sort of loop. We can do this in a fully Vectorized way, i.e.

#Create the pattern to accept 6 or more 0 before the second 1
#Compliments of @DavidArenburg
ptn <- "10{6,}1"


replace(grepl(ptn, do.call(paste0, dat[-1]))+1, rowSums(dat[-1]) == 0, 0)
#[1] 0 1 1 2 1

Or to make it a function,

get_counts <- function(df, ptn = "10{6,}1"){
  v1 <- paste0(ptn, collapse = '')
  replace(grepl(v1, do.call(paste0, df[-1]))+1, rowSums(df[-1]) == 0, 0)
}

get_counts(dat)
#[1] 0 1 1 2 1

Upvotes: 2

alexis_laz
alexis_laz

Reputation: 13122

A straightforward way to approach this is to simply loop over the numbers of each row and check the previous entries to decide whether or not a found "1" is to be counted. R's operators are vectorized, so either looping over 12 numbers or 12 series of numbers make no difference in approaching the problem.

All needed is to keep track of the last one seen:

last_seen_one = integer(nrow(dat))

and of the accumulating number of ones:

ones_nr = integer(nrow(dat))

Then, translating into a very simple algorithm, like:

for(j in 2:length(dat)) {
    has_one = dat[[j]] == 1L
    no_one = !last_seen_one
    i = which(has_one & (no_one | ((j - last_seen_one) >= 6)))
    ones_nr[i] = ones_nr[i] + 1L
    last_seen_one[has_one] = j
}

We get:

ones_nr
#[1] 0 1 1 2 1

This way, only a loop over 12 months/columns is needed instead of a loop over each id/row.

Upvotes: 1

demirev
demirev

Reputation: 195

How about using rollapply from the zoo package:

library(zoo)
library(magrittr)

myfun <- function(y, pattern = c(0,0,0,0,0,0,1)){
    y[is.na(y)] <- 0 # to account for both 0s and NAs
    first <- sum(y[1:(length(pattern)-1)])!=0
    rest  <- y %>% as.numeric() %>% rollapply(7, identical, pattern) %>% sum
    return(first+rest)
}

apply(dat[,-1],1,myfun)

[1] 0 1 1 2 1

The rollapply part will match any sequence of six 0s followed by a 1 in each row.

The only thing left is to account for 1s in the first 6 months (which you want to count but won't be matched by the rollapply). This is done with the first row of myfun.

Upvotes: 4

Florian
Florian

Reputation: 25385

I am going to make use of the fact that your function can return a maximum of 2 per row, since there can never be more than one such sequences of six zeroes. It will reach the maximum if there is a sequence of at least six zeroes somewhere, which does not start at the beginning or end at the end of the row (since then it is surrounded by a 1 on both sides.)

yoursum <- function(x)
{
  x[is.na(x)]<-0
  booleans = with(rle(x),values==0 & lengths>5)
  if(any(booleans))
  {
    if(which(booleans)<length(booleans) & which(booleans)>1 )
      return(2)
  }

  if(any(x>0))
    return(1)
  else
    return(0)
}

apply(dat[,-1],1,yoursum)

Output:

[1] 0 1 1 2 1

Upvotes: 2

Related Questions