Reputation: 189
I need to split the data into multiple groups based on range of values. Range here is 20 to 73.
As you can see in the image below, I need to split into such 3 different groups. You should also notice that right after a value hits 70-73 range, next value would be around 40's and then falls to 20's so the transition is gradual.
I am not concerned about the transitory values and intend to ignore them.
Sample data:
structure(list(V1 = c(27, 28, 34, 35, 47, 50, 52, 54, 55, 68,
69, 73, 45, 39, 30, 21, 23, 24, 22, 26,
29, 31, 32, 35, 42, 44, 46, 50, 55, 66,
69, 70, 47, 40, 33, 21, 22, 29, 31, 38,
47, 55, 59, 64, 66, 71)),
class = "data.frame",
row.names = c(NA, -45L))
Code I tried:
df[, ID := cumsum(V1>=73)+1]
Upvotes: 1
Views: 1086
Reputation: 1702
Here's an alternative again with dplyr
:
df2 <- df %>%
mutate(high_val = if_else(V1 %in% tail(sort(V1),3), 1, 0)) %>%
mutate(cs_val = 1 + lag(cumsum(high_val))) %>%
replace_na(list(cs_val = 1, y = "unknown")) %>%
group_by(cs_val) %>%
mutate(counter = row_number(cs_val)) %>%
mutate(min_val = if_else(V1 == min(V1), 1, 0)) %>%
mutate(cs_count = cumsum(min_val)) %>%
filter(cs_count != 0) %>%
select(V1, groups = cs_val)
Not sure this is any less complex than the accepted answer. Basically I've created a bunch of columns to track mins and maxes within the groups I've set and filtered out the transitory values.
Result:
# A tibble: 40 x 2
V1 groups
<dbl> <dbl>
1 27 1
2 28 1
3 34 1
4 35 1
5 47 1
6 50 1
7 52 1
8 54 1
9 55 1
10 68 1
11 69 1
12 73 1
13 21 2
14 23 2
15 24 2
16 22 2
17 26 2
18 29 2
19 31 2
20 32 2
21 35 2
22 42 2
23 44 2
24 46 2
25 50 2
26 55 2
27 66 2
28 69 2
29 70 2
30 21 3
31 22 3
32 29 3
33 31 3
34 38 3
35 47 3
36 55 3
37 59 3
38 64 3
39 66 3
40 71 3
Upvotes: 1
Reputation: 14774
Perhaps this would work for you:
library(dplyr)
df %>%
group_by(groups = cumsum(coalesce(as.numeric(V1 < lag(V1) & lag(V1) >= 70), 1))) %>%
filter(!coalesce(lead(cumsum(coalesce(as.numeric(V1 > lag(V1)), 1))), 99) == 1) %>%
arrange(groups, V1)
Output:
V1 groups
1 27 1
2 28 1
3 34 1
4 35 1
5 47 1
6 50 1
7 52 1
8 54 1
9 55 1
10 68 1
11 69 1
12 73 1
13 21 2
14 22 2
15 23 2
16 24 2
17 26 2
18 29 2
19 31 2
20 32 2
21 35 2
22 42 2
23 44 2
24 46 2
25 50 2
26 55 2
27 66 2
28 69 2
29 70 2
30 21 3
31 22 3
32 29 3
33 31 3
34 38 3
35 47 3
36 55 3
37 59 3
38 64 3
39 66 3
40 71 3
Data:
df <- structure(list(V1 = c(27, 28, 34, 35, 47, 50, 52, 54, 55, 68,
69, 73, 45, 39, 30, 21, 23, 24, 22, 26, 29, 31, 32, 35, 42, 44,
46, 50, 55, 66, 69, 70, 47, 40, 33, 21, 22, 29, 31, 38, 47, 55,
59, 64, 66, 71)), class = "data.frame", row.names = c(NA, -46L
))
Upvotes: 1