Tareva
Tareva

Reputation: 230

Event Counter with condition

below is a dataframe df which has 1 variable ID with 500K data points, I need to implent an event counter with the following conditions.
1. Increment event counter when ID == A
2. First 3 datapoints should not be considered for the counter increment though ID == A.
Below shows the data frame df with expected output

ID       Event Counter  
D          0  
F          0  
V          0
A          0  
A          0  
A          0
A          1  
A          1  
A          1
V          1  
F          1  
A          1
A          1
A          1  
A          2  
F          2  
G          2 
A          2  
A          2  
A          2  
A          3  
A          3  

Please note :- Row number 1,2 and 3 doesnt satisfy the condition, Hence No increment in Event Counter. Though ID ==A in row 4,5 and 6 the event counter will not increment (Refernece: Condition 2). Same in case of row number 12,13 and 14.

Found similar question where the counter increments for every encounter of data point which satisfies the condition, but my implementation conditions are different.

Upvotes: 2

Views: 245

Answers (3)

lmo
lmo

Reputation: 38510

Here is a base R alternative using split and lapply.

dat$v3 <-
  cumsum(unlist(lapply(split(dat$ID,
                           with(rle(as.character(dat$ID)), rep(seq_along(values), lengths))),
                       function(x) {
                         v <- length(x)
                         if(x[1] == "A" && v > 3) rep(c(0, 1, 0), c(3, 1, v-4))
                         else rep(0, v)
                       })))

The ID variable is split using a similar method to the to that in docendo-discimus's answer, splitting on runs of the same ID. This list is fed to lapply which checks if the group is composed of As and if the group has at least 3 elements. If so, then a vector with 3 0s followed by a 1 and the remaining elements of 0 is returned to match the length of the vector. If the check fails, then a vector of 0s of the proper length is returned.

This returns

dat
   ID Event_Counter v3
1   D             0  0
2   F             0  0
3   V             0  0
4   A             0  0
5   A             0  0
6   A             0  0
7   A             1  1
8   A             1  1
9   A             1  1
10  V             1  1
11  F             1  1
12  A             1  1
13  A             1  1
14  A             1  1
15  A             2  2
16  F             2  2
17  G             2  2
18  A             2  2
19  A             2  2
20  A             2  2
21  A             3  3
22  A             3  3

Upvotes: 0

talat
talat

Reputation: 70286

You can use zoo::rollsum for this kind of task combined with rle:

library(zoo)
x <- rollsumr(df$ID == "A", k=4, fill = NA)
df$new <- with(rle(!is.na(x) & x == 4), rep(cumsum(values), lengths))

The k = 4 and x == 4 in this case mean that you need 3 cases of ID == "A" before you want to increment. You can change this number as you wish.

The result is equal to your desired output:

all.equal(df$Event_counter, df$new)
#[1] TRUE

The rle part returns:

rle(!is.na(x) & x == 4)
#Run Length Encoding
#  lengths: int [1:6] 6 3 5 1 5 2
#  values : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE

Now we can a) compute the cumulative sum of the values, i.e. 0-1-1-2 ... b) using rep we repeat each of these value the same number of times that each sequence was long, i.e. the lengths.

Upvotes: 5

Florian
Florian

Reputation: 25385

This seems to do what you want:

df = read.table(text="ID Event_counter 
D          0  
F          0  
V          0
A          0  
A          0  
A          0
A          1  
A          1  
A          1
V          1  
F          1  
A          1
A          1
A          1  
A          2  
F          2  
G          2 
A          2  
A          2  
A          2  
A          3  
A          3",header=TRUE)

indices = df$ID=="A"
reset.counter = indices!=c(NA,head(indices,-1))& indices==FALSE & c(NA,head(indices,-1))==TRUE
indices <- unname(split(indices, cumsum(seq_along(indices) %in% which(reset.counter))))
indices=unlist(lapply(indices, function(x) cumsum(x)==4 & x==TRUE))
df$Event_counter_check =  cumsum(indices)

OUTPUT

   ID Event_counter Event_counter_check
1   D             0                   0
2   F             0                   0
3   V             0                   0
4   A             0                   0
5   A             0                   0
6   A             0                   0
7   A             1                   1
8   A             1                   1
9   A             1                   1
10  V             1                   1
11  F             1                   1
12  A             1                   1
13  A             1                   1
14  A             1                   1
15  A             2                   2
16  F             2                   2
17  G             2                   2
18  A             2                   2
19  A             2                   2
20  A             2                   2
21  A             3                   3
22  A             3                   3

Hope this helps!

Upvotes: 1

Related Questions