Reputation: 1149
In the example below, the event start is defined as when the prior value of "values" is 90 or more and the current value is below 90. The event end is when the current value is below 90 and the next value is 90 or above.
sequential_index <- seq(1,10)
values <- c(91,90,89,89,90,90,89,88,90,91)
df <- data.frame(sequential_index, values)
Looking at df in the example above, the first event occurs for observations 3-4 and the second event occurs for observations 7-8. I am trying, to no avail, to add an "events" column to the above data frame that looks something like this:
sequential_index values events
1 1 91 NA
2 2 90 NA
3 3 89 1
4 4 89 1
5 5 90 NA
6 6 90 NA
7 7 89 2
8 8 88 2
9 9 90 NA
10 10 91 NA
My dataset is rather large and I'm trying to avoid for loops.
Thanks in advance,
-jt
Upvotes: 5
Views: 190
Reputation: 887511
One option with base R
would be rle
df$events <- inverse.rle(within.list(rle(df$values < 90),
values[values] <- seq_along(values[values])
))
df$events[df$events == 0] <- NA
df$events
#[1] NA NA 1 1 NA NA 2 2 NA NA
Or in a compact way with data.table
library(data.table)
setDT(df)[, events := as.integer(factor(rleid(events < 90)[events < 90]))]
Upvotes: 2
Reputation: 690
I have this solution using dplyr
.
library(dplyr)
df %>%
# Define the start of events (putting 1 at the start of events)
mutate(events = case_when(lag(values)>=90 & values<90 ~ 1, TRUE ~ 0)) %>%
# Extend the events using cumsum()
mutate(events = case_when(values<90 ~ cumsum(events)))
Output :
sequential_index values events
1 1 91 NA
2 2 90 NA
3 3 89 1
4 4 89 1
5 5 90 NA
6 6 90 NA
7 7 89 2
8 8 88 2
9 9 90 NA
10 10 91 NA
Upvotes: 3