AEP
AEP

Reputation: 172

Create sequence variable that restarts based on value of another column grouped by ID

I would like to create a sequence variable that resets when the value of another column exceeds 7. Additionally, I would like the sequencing to be grouped by ID and in order of date.

That is like column 3 in the example data frame below:

# ID     Column_1   Column_2   Column_3
# 1        DATE        0           0
# 1        DATE        2           1
# 1        DATE        4           2
# 1        DATE        100         0 
# 1        DATE        3           1
# 1        DATE        50          0
# 1        DATE        2           1
# 1        DATE        5           2
# 1        DATE        20          0
# 1        DATE        3           1 
# 1        DATE        59          0
# 1        DATE        1           1   

I have tried the following but it doesn't seem to work.

df <- df %>%
  group_by(ID) %>%
  arrange(DATE) %>%
  mutate(Column_3 = ave(Column_2, cumsum(Column_2 >7)), .after = 
  Column2)

Thank you in advance for any help!

Upvotes: 4

Views: 1107

Answers (4)

Anoushiravan R
Anoushiravan R

Reputation: 21938

Here is a base R approach:

do.call(rbind, lapply(unique(df$ID), function(x) {
  tmp <- subset(df, ID == x)
  tmp$date <- sort(tmp$date)
  tmp[["Column_3"]] <- Reduce(function(a, b) {
    if(b <= 7) {
      a + 1
    } else {
      0
    }
  }, init = 0, tmp[["Column_2"]][-1], accumulate = TRUE)
  tmp
}))

   ID Column_1 Column_2 Column_3
1   1     DATE        0        0
2   1     DATE        2        1
3   1     DATE        4        2
4   1     DATE      100        0
5   1     DATE        3        1
6   1     DATE       50        0
7   1     DATE        2        1
8   1     DATE        5        2
9   1     DATE       20        0
10  1     DATE        3        1
11  1     DATE       59        0
12  1     DATE        1        1

Upvotes: 1

rjen
rjen

Reputation: 1982

A tidyverse option using dplyr could look as follows. It is not nearly as neat as the solution by GuedesBF, though.

library(dplyr)

df %>%
  group_by(ID) %>%
  arrange(Column_1) %>%
  mutate(grp = cumsum(Column_2 > 7)) %>%
  group_by(ID, grp) %>%
  mutate(Column_3 = row_number()-1) %>%
  ungroup() %>%
  select(-grp)

# # A tibble: 12 x 4
#       ID Column_1 Column_2 Column_3
#    <int> <chr>       <int>    <dbl>
#  1     1 DATE            0        0
#  2     1 DATE            2        1
#  3     1 DATE            4        2
#  4     1 DATE          100        0
#  5     1 DATE            3        1
#  6     1 DATE           50        0
#  7     1 DATE            2        1
#  8     1 DATE            5        2
#  9     1 DATE           20        0
# 10     1 DATE            3        1
# 11     1 DATE           59        0
# 12     1 DATE            1        1

Data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), Column_1 = c("DATE", "DATE", "DATE", "DATE", "DATE", 
"DATE", "DATE", "DATE", "DATE", "DATE", "DATE", "DATE"), Column_2 = c(0L, 
2L, 4L, 100L, 3L, 50L, 2L, 5L, 20L, 3L, 59L, 1L)), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 1

TarJae
TarJae

Reputation: 79204

Use c(F, ..) to indicate first position and

FUN=seq_along) - 1 to start with 0.

Our running count lead(Column_2) > 7

df %>%
    group_by(ID) %>%
    arrange(Column_1) %>%
    mutate(Column_3 = ave(Column_2, cumsum(c(F, lead(Column_2) > 7)), FUN=seq_along) - 1)

Upvotes: 1

GuedesBF
GuedesBF

Reputation: 9878

You can also do it with purrr::accumulate():

library(dplyr)
library(purrr)

df %>%
        group_by(ID) %>%
        arrange(Column_1) %>%
        mutate(Column_3 = accumulate(Column_2>7, ~ifelse(.y==TRUE, 0, .x+1)))

# A tibble: 12 x 4
# Groups:   ID [1]
      ID Column_1 Column_2 Column_3
   <dbl> <chr>       <dbl>    <dbl>
 1     1 DATE            0        0
 2     1 DATE            2        1
 3     1 DATE            4        2
 4     1 DATE          100        0
 5     1 DATE            3        1
 6     1 DATE           50        0
 7     1 DATE            2        1
 8     1 DATE            5        2
 9     1 DATE           20        0
10     1 DATE            3        1
11     1 DATE           59        0
12     1 DATE            1        1

data

df1<-data.frame(ID=rep(1, 12), Column_1=rep("DATE", 12), Column_2=c(0,2,4,100, 3,50,2,5,20,3,59,1))

Upvotes: 1

Related Questions