Reputation: 796
I have the following dataframe with 0, 1, and NAs for IDs A to E over a one year period:
dat <- data.frame(
id = c("A", "B", "C", "D", "E"),
jan = c(0, 0, NA, 1, 0),
feb = c(0, 1, 1, 0, 0),
mar = c(0, 0, 1, 0, 1),
apr = c(0, NA, 0, NA, 1),
may = c(0, NA, 0, 0, 0),
jun = c(0, 0, 0, 0, 0),
jul = c(0, 0, 0, 0, 1),
aug = c(NA, 0, 0, 1, 1),
sep = c(NA, 0, 0, 1, NA),
okt = c(NA, 0, 0, 0, NA),
nov = c(NA, 0, 0, 0, 1),
dez = c(NA, 0, 0, 0, 0)
)
> dat
id jan feb mar apr may jun jul aug sep okt nov dez
A 0 0 0 0 0 0 0 NA NA NA NA NA
B 0 1 0 NA NA 0 0 0 0 0 0 0
C NA 1 1 0 0 0 0 0 0 0 0 0
D 1 0 0 NA 0 0 0 1 1 0 0 0
E 0 0 1 1 0 0 1 1 NA NA 1 0
I would like to count the number of 1s for each ID over this one year period, but the following conditions need to be met:
In my example, the count would be:
> dat
id jan feb mar apr may jun jul aug sep okt nov dez count
1 A 0 0 0 0 0 0 0 NA NA NA NA NA => 0
2 B 0 1 0 NA NA 0 0 0 0 0 0 0 => 1
3 C NA 1 1 0 0 0 0 0 0 0 0 0 => 1
4 D 1 0 0 NA 0 0 0 1 1 0 0 0 => 2
5 E 0 0 1 1 0 0 1 1 NA NA 1 0 => 1
The function should be applied rowwise in the form of apply(dat[, -1], 1, my_fun)
and return a vector containing the count (i.e. 0, 1, 1, 2, 1
). Does anybody have an idea how to achieve this?
Upvotes: 4
Views: 134
Reputation: 51582
Since your data set is for months, then with 12 months you can only have one pattern where 1 will count as a second 1, so the maximum number of 1s you will ever have is two. In this case then you don't need any sort of loop. We can do this in a fully Vectorized way, i.e.
#Create the pattern to accept 6 or more 0 before the second 1
#Compliments of @DavidArenburg
ptn <- "10{6,}1"
replace(grepl(ptn, do.call(paste0, dat[-1]))+1, rowSums(dat[-1]) == 0, 0)
#[1] 0 1 1 2 1
Or to make it a function,
get_counts <- function(df, ptn = "10{6,}1"){
v1 <- paste0(ptn, collapse = '')
replace(grepl(v1, do.call(paste0, df[-1]))+1, rowSums(df[-1]) == 0, 0)
}
get_counts(dat)
#[1] 0 1 1 2 1
Upvotes: 2
Reputation: 13122
A straightforward way to approach this is to simply loop over the numbers of each row and check the previous entries to decide whether or not a found "1" is to be counted. R's operators are vectorized, so either looping over 12 numbers or 12 series of numbers make no difference in approaching the problem.
All needed is to keep track of the last one seen:
last_seen_one = integer(nrow(dat))
and of the accumulating number of ones:
ones_nr = integer(nrow(dat))
Then, translating into a very simple algorithm, like:
for(j in 2:length(dat)) {
has_one = dat[[j]] == 1L
no_one = !last_seen_one
i = which(has_one & (no_one | ((j - last_seen_one) >= 6)))
ones_nr[i] = ones_nr[i] + 1L
last_seen_one[has_one] = j
}
We get:
ones_nr
#[1] 0 1 1 2 1
This way, only a loop over 12 months/columns is needed instead of a loop over each id/row.
Upvotes: 1
Reputation: 195
How about using rollapply
from the zoo package:
library(zoo)
library(magrittr)
myfun <- function(y, pattern = c(0,0,0,0,0,0,1)){
y[is.na(y)] <- 0 # to account for both 0s and NAs
first <- sum(y[1:(length(pattern)-1)])!=0
rest <- y %>% as.numeric() %>% rollapply(7, identical, pattern) %>% sum
return(first+rest)
}
apply(dat[,-1],1,myfun)
[1] 0 1 1 2 1
The rollapply part will match any sequence of six 0s followed by a 1 in each row.
The only thing left is to account for 1s in the first 6 months (which you want to count but won't be matched by the rollapply). This is done with the first row of myfun
.
Upvotes: 4
Reputation: 25385
I am going to make use of the fact that your function can return a maximum of 2 per row, since there can never be more than one such sequences of six zeroes. It will reach the maximum if there is a sequence of at least six zeroes somewhere, which does not start at the beginning or end at the end of the row (since then it is surrounded by a 1 on both sides.)
yoursum <- function(x)
{
x[is.na(x)]<-0
booleans = with(rle(x),values==0 & lengths>5)
if(any(booleans))
{
if(which(booleans)<length(booleans) & which(booleans)>1 )
return(2)
}
if(any(x>0))
return(1)
else
return(0)
}
apply(dat[,-1],1,yoursum)
Output:
[1] 0 1 1 2 1
Upvotes: 2