Reputation: 172
I would like to create a sequence variable that resets when the value of another column exceeds 7. Additionally, I would like the sequencing to be grouped by ID and in order of date.
That is like column 3 in the example data frame below:
# ID Column_1 Column_2 Column_3
# 1 DATE 0 0
# 1 DATE 2 1
# 1 DATE 4 2
# 1 DATE 100 0
# 1 DATE 3 1
# 1 DATE 50 0
# 1 DATE 2 1
# 1 DATE 5 2
# 1 DATE 20 0
# 1 DATE 3 1
# 1 DATE 59 0
# 1 DATE 1 1
I have tried the following but it doesn't seem to work.
df <- df %>%
group_by(ID) %>%
arrange(DATE) %>%
mutate(Column_3 = ave(Column_2, cumsum(Column_2 >7)), .after =
Column2)
Thank you in advance for any help!
Upvotes: 4
Views: 1107
Reputation: 21938
Here is a base R approach:
do.call(rbind, lapply(unique(df$ID), function(x) {
tmp <- subset(df, ID == x)
tmp$date <- sort(tmp$date)
tmp[["Column_3"]] <- Reduce(function(a, b) {
if(b <= 7) {
a + 1
} else {
0
}
}, init = 0, tmp[["Column_2"]][-1], accumulate = TRUE)
tmp
}))
ID Column_1 Column_2 Column_3
1 1 DATE 0 0
2 1 DATE 2 1
3 1 DATE 4 2
4 1 DATE 100 0
5 1 DATE 3 1
6 1 DATE 50 0
7 1 DATE 2 1
8 1 DATE 5 2
9 1 DATE 20 0
10 1 DATE 3 1
11 1 DATE 59 0
12 1 DATE 1 1
Upvotes: 1
Reputation: 1982
A tidyverse option using dplyr
could look as follows. It is not nearly as neat as the solution by GuedesBF, though.
library(dplyr)
df %>%
group_by(ID) %>%
arrange(Column_1) %>%
mutate(grp = cumsum(Column_2 > 7)) %>%
group_by(ID, grp) %>%
mutate(Column_3 = row_number()-1) %>%
ungroup() %>%
select(-grp)
# # A tibble: 12 x 4
# ID Column_1 Column_2 Column_3
# <int> <chr> <int> <dbl>
# 1 1 DATE 0 0
# 2 1 DATE 2 1
# 3 1 DATE 4 2
# 4 1 DATE 100 0
# 5 1 DATE 3 1
# 6 1 DATE 50 0
# 7 1 DATE 2 1
# 8 1 DATE 5 2
# 9 1 DATE 20 0
# 10 1 DATE 3 1
# 11 1 DATE 59 0
# 12 1 DATE 1 1
Data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), Column_1 = c("DATE", "DATE", "DATE", "DATE", "DATE",
"DATE", "DATE", "DATE", "DATE", "DATE", "DATE", "DATE"), Column_2 = c(0L,
2L, 4L, 100L, 3L, 50L, 2L, 5L, 20L, 3L, 59L, 1L)), class = "data.frame", row.names = c(NA,
-12L))
Upvotes: 1
Reputation: 79204
Use c(F, ..)
to indicate first position and
FUN=seq_along) - 1
to start with 0.
Our running count lead(Column_2) > 7
df %>%
group_by(ID) %>%
arrange(Column_1) %>%
mutate(Column_3 = ave(Column_2, cumsum(c(F, lead(Column_2) > 7)), FUN=seq_along) - 1)
Upvotes: 1
Reputation: 9878
You can also do it with purrr::accumulate()
:
library(dplyr)
library(purrr)
df %>%
group_by(ID) %>%
arrange(Column_1) %>%
mutate(Column_3 = accumulate(Column_2>7, ~ifelse(.y==TRUE, 0, .x+1)))
# A tibble: 12 x 4
# Groups: ID [1]
ID Column_1 Column_2 Column_3
<dbl> <chr> <dbl> <dbl>
1 1 DATE 0 0
2 1 DATE 2 1
3 1 DATE 4 2
4 1 DATE 100 0
5 1 DATE 3 1
6 1 DATE 50 0
7 1 DATE 2 1
8 1 DATE 5 2
9 1 DATE 20 0
10 1 DATE 3 1
11 1 DATE 59 0
12 1 DATE 1 1
data
df1<-data.frame(ID=rep(1, 12), Column_1=rep("DATE", 12), Column_2=c(0,2,4,100, 3,50,2,5,20,3,59,1))
Upvotes: 1