Shu
Shu

Reputation: 57

How to label groups in df based on specific sequence of values in a column

I have a dataframe with the id and value columns indicated below, but want to determine the Status column based on values in the value column, by id groups.

  
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
                serial = rep(1:10,3),
                value = c(rep(1,4), rep(0,3), rep(1,3),
                          rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
                          rep(c(1,0),5)),
                status = c(rep("Fluctuating", 10),
                           rep("Fluctuating", 10),
                           rep("Not fluctuating", 10)))
   id serial value          status
1   1      1     1     Fluctuating
2   1      2     1     Fluctuating
3   1      3     1     Fluctuating
4   1      4     1     Fluctuating
5   1      5     0     Fluctuating
6   1      6     0     Fluctuating
7   1      7     0     Fluctuating
8   1      8     1     Fluctuating
9   1      9     1     Fluctuating
10  1     10     1     Fluctuating
11  2      1     1     Fluctuating
12  2      2     1     Fluctuating
13  2      3     1     Fluctuating
14  2      4     1     Fluctuating
15  2      5     0     Fluctuating
16  2      6    -1     Fluctuating
17  2      7    -1     Fluctuating
18  2      8     1     Fluctuating
19  2      9     1     Fluctuating
20  2     10     1     Fluctuating
21  3      1     1 Not fluctuating
22  3      2     0 Not fluctuating
23  3      3     1 Not fluctuating
24  3      4     0 Not fluctuating
25  3      5     1 Not fluctuating
26  3      6     0 Not fluctuating
27  3      7     1 Not fluctuating
28  3      8     0 Not fluctuating
29  3      9     1 Not fluctuating
30  3     10     0 Not fluctuating

Here, a group is considered to be fluctuating if three or more 1s is followed by 3 or more (0s or -1s), followed by 3 or more 1s again. It would also be considered fluctuating if three or more alternating 0s-1s-0s, -1s-0s-1s, etc.

Wondering what the best way is to assign the status column, preferably using dplyr?

Thanks!

Upvotes: 1

Views: 91

Answers (2)

r2evans
r2evans

Reputation: 160607

library(dplyr)
# library(zoo) # rollapply
threes <- function(z, minlen = 3L, ptn = c(TRUE, FALSE, TRUE)) {
  r <- rle(z > 0)
  starts <- zoo::rollapply(r$lengths >= minlen, minlen, all, fill = FALSE, align = "left")
  for (st in which(starts)) {
    if (all(r$values[st + seq_len(minlen) - 1L] == ptn)) return(TRUE)
  }
  return(FALSE)
}

x %>%
  group_by(id) %>%
  mutate(status2 = paste0(if (threes(value)) "" else "Not ", "Fluctuating")) %>%
  ungroup() %>%
  print(n = 99)
# # A tibble: 30 x 5
#       id serial value status          status2        
#    <dbl>  <int> <dbl> <chr>           <chr>          
#  1     1      1     1 Fluctuating     Fluctuating    
#  2     1      2     1 Fluctuating     Fluctuating    
#  3     1      3     1 Fluctuating     Fluctuating    
#  4     1      4     1 Fluctuating     Fluctuating    
#  5     1      5     0 Fluctuating     Fluctuating    
#  6     1      6     0 Fluctuating     Fluctuating    
#  7     1      7     0 Fluctuating     Fluctuating    
#  8     1      8     1 Fluctuating     Fluctuating    
#  9     1      9     1 Fluctuating     Fluctuating    
# 10     1     10     1 Fluctuating     Fluctuating    
# 11     2      1     1 Fluctuating     Fluctuating    
# 12     2      2     1 Fluctuating     Fluctuating    
# 13     2      3     1 Fluctuating     Fluctuating    
# 14     2      4     1 Fluctuating     Fluctuating    
# 15     2      5     0 Fluctuating     Fluctuating    
# 16     2      6    -1 Fluctuating     Fluctuating    
# 17     2      7    -1 Fluctuating     Fluctuating    
# 18     2      8     1 Fluctuating     Fluctuating    
# 19     2      9     1 Fluctuating     Fluctuating    
# 20     2     10     1 Fluctuating     Fluctuating    
# 21     3      1     1 Not fluctuating Not Fluctuating
# 22     3      2     0 Not fluctuating Not Fluctuating
# 23     3      3     1 Not fluctuating Not Fluctuating
# 24     3      4     0 Not fluctuating Not Fluctuating
# 25     3      5     1 Not fluctuating Not Fluctuating
# 26     3      6     0 Not fluctuating Not Fluctuating
# 27     3      7     1 Not fluctuating Not Fluctuating
# 28     3      8     0 Not fluctuating Not Fluctuating
# 29     3      9     1 Not fluctuating Not Fluctuating
# 30     3     10     0 Not fluctuating Not Fluctuating

Upvotes: 1

Ric S
Ric S

Reputation: 9257

Using rle function and dplyr library

x %>% 
  mutate(value_new = ifelse(value == -1, 0, value)) %>% 
  group_by(id) %>% 
  mutate(status = ifelse(all(rle(value_new)$lengths >= 3), "Fluctuating", "Not fluctuating")) %>% 
  select(-value_new) 

Output

# A tibble: 30 x 4
# Groups:   id [3]
      id serial value status         
   <dbl>  <int> <dbl> <chr>          
 1     1      1     1 Fluctuating    
 2     1      2     1 Fluctuating    
 3     1      3     1 Fluctuating    
 4     1      4     1 Fluctuating    
 5     1      5     0 Fluctuating    
 6     1      6     0 Fluctuating    
 7     1      7     0 Fluctuating    
 8     1      8     1 Fluctuating    
 9     1      9     1 Fluctuating    
10     1     10     1 Fluctuating    
11     2      1     1 Fluctuating    
12     2      2     1 Fluctuating    
13     2      3     1 Fluctuating    
14     2      4     1 Fluctuating    
15     2      5     0 Fluctuating    
16     2      6    -1 Fluctuating    
17     2      7    -1 Fluctuating    
18     2      8     1 Fluctuating    
19     2      9     1 Fluctuating    
20     2     10     1 Fluctuating    
21     3      1     1 Not fluctuating
22     3      2     0 Not fluctuating
23     3      3     1 Not fluctuating
24     3      4     0 Not fluctuating
25     3      5     1 Not fluctuating
26     3      6     0 Not fluctuating
27     3      7     1 Not fluctuating
28     3      8     0 Not fluctuating
29     3      9     1 Not fluctuating
30     3     10     0 Not fluctuating

Upvotes: 1

Related Questions