Reputation: 230
below is a dataframe df
which has 1 variable ID
with 500K data points, I need to implent an event counter
with the following conditions.
1. Increment event counter
when ID == A
2. First 3 datapoints should not be considered for the counter increment though ID == A
.
Below shows the data frame df
with expected output
ID Event Counter
D 0
F 0
V 0
A 0
A 0
A 0
A 1
A 1
A 1
V 1
F 1
A 1
A 1
A 1
A 2
F 2
G 2
A 2
A 2
A 2
A 3
A 3
Please note :- Row number 1,2 and 3 doesnt satisfy the condition, Hence No increment in Event Counter
. Though ID ==A
in row 4,5 and 6 the event counter
will not increment (Refernece: Condition 2). Same in case of row number 12,13 and 14.
Found similar question where the counter increments for every encounter of data point which satisfies the condition, but my implementation conditions are different.
Upvotes: 2
Views: 245
Reputation: 38510
Here is a base R alternative using split
and lapply
.
dat$v3 <-
cumsum(unlist(lapply(split(dat$ID,
with(rle(as.character(dat$ID)), rep(seq_along(values), lengths))),
function(x) {
v <- length(x)
if(x[1] == "A" && v > 3) rep(c(0, 1, 0), c(3, 1, v-4))
else rep(0, v)
})))
The ID variable is split using a similar method to the to that in docendo-discimus's answer, splitting on runs of the same ID. This list is fed to lapply
which checks if the group is composed of As and if the group has at least 3 elements. If so, then a vector with 3 0s followed by a 1 and the remaining elements of 0 is returned to match the length of the vector. If the check fails, then a vector of 0s of the proper length is returned.
This returns
dat
ID Event_Counter v3
1 D 0 0
2 F 0 0
3 V 0 0
4 A 0 0
5 A 0 0
6 A 0 0
7 A 1 1
8 A 1 1
9 A 1 1
10 V 1 1
11 F 1 1
12 A 1 1
13 A 1 1
14 A 1 1
15 A 2 2
16 F 2 2
17 G 2 2
18 A 2 2
19 A 2 2
20 A 2 2
21 A 3 3
22 A 3 3
Upvotes: 0
Reputation: 70286
You can use zoo::rollsum
for this kind of task combined with rle
:
library(zoo)
x <- rollsumr(df$ID == "A", k=4, fill = NA)
df$new <- with(rle(!is.na(x) & x == 4), rep(cumsum(values), lengths))
The k = 4
and x == 4
in this case mean that you need 3 cases of ID == "A"
before you want to increment. You can change this number as you wish.
The result is equal to your desired output:
all.equal(df$Event_counter, df$new)
#[1] TRUE
The rle
part returns:
rle(!is.na(x) & x == 4)
#Run Length Encoding
# lengths: int [1:6] 6 3 5 1 5 2
# values : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE
Now we can a) compute the cumulative sum of the values, i.e. 0-1-1-2 ... b) using rep
we repeat each of these value the same number of times that each sequence was long, i.e. the lengths
.
Upvotes: 5
Reputation: 25385
This seems to do what you want:
df = read.table(text="ID Event_counter
D 0
F 0
V 0
A 0
A 0
A 0
A 1
A 1
A 1
V 1
F 1
A 1
A 1
A 1
A 2
F 2
G 2
A 2
A 2
A 2
A 3
A 3",header=TRUE)
indices = df$ID=="A"
reset.counter = indices!=c(NA,head(indices,-1))& indices==FALSE & c(NA,head(indices,-1))==TRUE
indices <- unname(split(indices, cumsum(seq_along(indices) %in% which(reset.counter))))
indices=unlist(lapply(indices, function(x) cumsum(x)==4 & x==TRUE))
df$Event_counter_check = cumsum(indices)
OUTPUT
ID Event_counter Event_counter_check
1 D 0 0
2 F 0 0
3 V 0 0
4 A 0 0
5 A 0 0
6 A 0 0
7 A 1 1
8 A 1 1
9 A 1 1
10 V 1 1
11 F 1 1
12 A 1 1
13 A 1 1
14 A 1 1
15 A 2 2
16 F 2 2
17 G 2 2
18 A 2 2
19 A 2 2
20 A 2 2
21 A 3 3
22 A 3 3
Hope this helps!
Upvotes: 1