Rfanatic
Rfanatic

Reputation: 2282

How to identify consecutive observations based on time steps?

I would like to identify if an activity occurs consecutive times and how often during a week. The starting point is t1 that records the occurrence of an activity at t1_1 , t1_2, t1_3 and so on. For example in the case of id 12 activity occurred at t1_2, t1_3, t2_2, t3_1, t3_3, t4_2, t5_2, t6_1, t6_2, t6_3 and t7_3. As here was reported activity during all 7 days I assume the activity occurred consecutively. I would like to identify all id's in which an activity occured consecutively and the sum of occurrence.

Input

id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12  0    1     1    0     1   0    1    0    1    0    1    0    0    1    0     1   1     1   0      0  1
123 0    0     0    1     1   1    0    0    0    1    1    1    1    1    1     0   0     0    1     1  1
 10  1   1     1    1     1    1    1   1    1    1    1    1    1    1    1     1   1     1    1     1  1   

Output

Id   Sum 
12    11
10    21

Upvotes: 1

Views: 77

Answers (2)

chinsoon12
chinsoon12

Reputation: 25225

An option using data.table:

melt(DT, id.vars="id")[, 
    c("day", "time") := tstrsplit(variable, "_")][
        value==1L, if(all(paste0("t", 1L:7L) %chin% day)) .(Sum=sum(value)) , id]

output:

   id Sum
1: 10  21
2: 12  11

data:

library(data.table)
DT <- fread("id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12  0    1     1    0     1   0    1    0    1    0    1    0    0    1    0     1   1     1   0      0  1
123 0    0     0    1     1   1    0    0    0    1    1    1    1    1    1     0   0     0    1     1  1
10  1   1     1    1     1    1    1   1    1    1    1    1    1    1    1     1   1     1    1     1  1")

Explanation:

  • Convert into long format using melt
  • use tstrsplit to split the columns names into day of the week and time
  • filter for value==1L and then for each id, check if all 7 days are in subset before summing (i.e. if(all(paste0("t", 1L:7L) %chin% day)) .(Sum=sum(value)))

Upvotes: 1

akrun
akrun

Reputation: 887213

Here is an option with rle. Loop over the rows of the dataset with apply (MARGIN = 1) without the 'id' column, apply rle and extract the lengths where the 'values' are 1 ('x1'). If the length of 'x1' is either 1 or greater than or equal to 7, get the sum (1 is because if all the values are 1). Then, stack the named list to a 2 column data.frame and set the names of the columns ('out')

out <- stack(setNames(apply(df1[-1], 1, function(x) {
      x1 <- with(rle(x), lengths[as.logical(values)])
     if(length(x1) >=7|length(x1) == 1) sum(x1) }), df1$id))[2:1]
names(out) <- c('Id', 'Sum')
out
#  Id Sum
#1 12  11
#2 10  21

data

df1 <- structure(list(id = c(12L, 123L, 10L), t1_1 = c(0L, 0L, 1L), 
    t1_2 = c(1L, 0L, 1L), t1_3 = c(1L, 0L, 1L), t2_1 = c(0L, 
    1L, 1L), t2_2 = c(1L, 1L, 1L), t2_3 = c(0L, 1L, 1L), t3_1 = c(1L, 
    0L, 1L), t3_2 = c(0L, 0L, 1L), t3_3 = c(1L, 0L, 1L), t4_1 = c(0L, 
    1L, 1L), t4_2 = c(1L, 1L, 1L), t4_3 = c(0L, 1L, 1L), t5_1 = c(0L, 
    1L, 1L), t5_2 = c(1L, 1L, 1L), t5_3 = c(0L, 1L, 1L), t6_1 = c(1L, 
    0L, 1L), t6_2 = c(1L, 0L, 1L), t6_3 = c(1L, 0L, 1L), t7_1 = c(0L, 
    1L, 1L), t7_2 = c(0L, 1L, 1L), t7_3 = c(1L, 1L, 1L)), 
    class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Related Questions