Coolsun
Coolsun

Reputation: 189

How to split data based on range

I need to split the data into multiple groups based on range of values. Range here is 20 to 73.

As you can see in the image below, I need to split into such 3 different groups. You should also notice that right after a value hits 70-73 range, next value would be around 40's and then falls to 20's so the transition is gradual.

I am not concerned about the transitory values and intend to ignore them.

Sample data:

structure(list(V1 = c(27, 28, 34, 35, 47, 50, 52, 54, 55, 68, 
                      69, 73, 45, 39, 30, 21, 23, 24, 22, 26, 
                      29, 31, 32, 35, 42, 44, 46, 50, 55, 66, 
                      69, 70, 47, 40, 33, 21, 22, 29, 31, 38, 
                      47, 55, 59, 64, 66, 71)), 
          class = "data.frame", 
          row.names = c(NA, -45L))

Code I tried:

df[, ID := cumsum(V1>=73)+1]

Expected Output

Plot of data & Expected Grouping

Upvotes: 1

Views: 1086

Answers (2)

DeduciveR
DeduciveR

Reputation: 1702

Here's an alternative again with dplyr:

df2 <- df %>% 
  mutate(high_val = if_else(V1 %in% tail(sort(V1),3), 1, 0)) %>%
  mutate(cs_val   = 1 + lag(cumsum(high_val))) %>%
  replace_na(list(cs_val = 1, y = "unknown")) %>% 
  group_by(cs_val) %>%
  mutate(counter  =  row_number(cs_val)) %>%
  mutate(min_val  =  if_else(V1 == min(V1), 1, 0)) %>%
  mutate(cs_count =  cumsum(min_val)) %>% 
  filter(cs_count != 0) %>% 
  select(V1, groups = cs_val)

Not sure this is any less complex than the accepted answer. Basically I've created a bunch of columns to track mins and maxes within the groups I've set and filtered out the transitory values.

Result:

# A tibble: 40 x 2
      V1 groups
   <dbl>  <dbl>
 1    27      1
 2    28      1
 3    34      1
 4    35      1
 5    47      1
 6    50      1
 7    52      1
 8    54      1
 9    55      1
10    68      1
11    69      1
12    73      1
13    21      2
14    23      2
15    24      2
16    22      2
17    26      2
18    29      2
19    31      2
20    32      2
21    35      2
22    42      2
23    44      2
24    46      2
25    50      2
26    55      2
27    66      2
28    69      2
29    70      2
30    21      3
31    22      3
32    29      3
33    31      3
34    38      3
35    47      3
36    55      3
37    59      3
38    64      3
39    66      3
40    71      3

Upvotes: 1

arg0naut91
arg0naut91

Reputation: 14774

Perhaps this would work for you:

library(dplyr)

df %>%
  group_by(groups = cumsum(coalesce(as.numeric(V1 < lag(V1) & lag(V1) >= 70), 1))) %>%
  filter(!coalesce(lead(cumsum(coalesce(as.numeric(V1 > lag(V1)), 1))), 99) == 1) %>%
  arrange(groups, V1)

Output:

   V1 groups
1  27      1
2  28      1
3  34      1
4  35      1
5  47      1
6  50      1
7  52      1
8  54      1
9  55      1
10 68      1
11 69      1
12 73      1
13 21      2
14 22      2
15 23      2
16 24      2
17 26      2
18 29      2
19 31      2
20 32      2
21 35      2
22 42      2
23 44      2
24 46      2
25 50      2
26 55      2
27 66      2
28 69      2
29 70      2
30 21      3
31 22      3
32 29      3
33 31      3
34 38      3
35 47      3
36 55      3
37 59      3
38 64      3
39 66      3
40 71      3

Data:

df <- structure(list(V1 = c(27, 28, 34, 35, 47, 50, 52, 54, 55, 68, 
69, 73, 45, 39, 30, 21, 23, 24, 22, 26, 29, 31, 32, 35, 42, 44, 
46, 50, 55, 66, 69, 70, 47, 40, 33, 21, 22, 29, 31, 38, 47, 55, 
59, 64, 66, 71)), class = "data.frame", row.names = c(NA, -46L
))

Upvotes: 1

Related Questions