Reputation: 13
This question is similar to one already answered: R: Splitting dataframe into subgroups consisting of every consecutive 2 groups
However, rather than splitting into subgroups that have a type in common, I need to split into subgroups that contain two consecutive types and are distinct. The groups in my actual data have differing numbers of rows as well.
df <- data.frame(ID=c('1','1','1','1','1','1','1'), Type=c('a','a','b','c','c','d','d'), value=c(10,2,5,3,7,3,9))
ID Type value
1 1 a 10
2 1 a 2
3 1 b 5
4 1 c 3
5 1 c 7
6 1 d 3
7 1 d 9
So subgroup 1 would be Type a and b:
ID Type value
1 1 a 10
2 1 a 2
3 1 b 5
And subgroup 2 would be Type c and d:
ID Type value
4 1 c 3
5 1 c 7
6 1 d 3
7 1 d 9
I have tried manipulating the code from this previous example, but I can't figure out how to make this happen without having overlapping Types in each group. Any help would be greatly appreciated - thanks!
EDIT: thanks for pointing out I didn't actually include the correct link.
Upvotes: 1
Views: 127
Reputation: 76460
Here is a rle
way, written as a function. Pass the data.frame and the split column name as a character string.
df <- data.frame(ID=c('1','1','1','1','1','1','1'),
Type=c('a','a','b','c','c','d','d'),
value=c(10,2,5,3,7,3,9))
split_two <- function(x, col) {
r <- rle(x[[col]])
r$values[c(FALSE, TRUE)] <- r$values[c(TRUE, FALSE)]
split(x, inverse.rle(r))
}
split_two(df, "Type")
#> $a
#> ID Type value
#> 1 1 a 10
#> 2 1 a 2
#> 3 1 b 5
#>
#> $c
#> ID Type value
#> 4 1 c 3
#> 5 1 c 7
#> 6 1 d 3
#> 7 1 d 9
Created on 2023-02-09 with reprex v2.0.2
Upvotes: 1
Reputation: 145805
We can do a little manipulation of a dense_rank
of the Type
variable to make an appropriate grouping variable:
library(dplyr)
df %>%
group_by(g = (dense_rank(match(Type, Type)) - 1) %/% 2) %>%
group_split()
# [[1]]
# # A tibble: 3 × 4
# ID Type value g
# <chr> <chr> <dbl> <dbl>
# 1 1 a 10 0
# 2 1 a 2 0
# 3 1 b 5 0
#
# [[2]]
# # A tibble: 4 × 4
# ID Type value g
# <chr> <chr> <dbl> <dbl>
# 1 1 c 3 1
# 2 1 c 7 1
# 3 1 d 3 1
# 4 1 d 9 1
Explanation: match(Type, Type)
converts Type
into integers ordered by number of appearance - but not dense. dense_rank()
makes that dense (no gaps). We then subtract 1 to make it start at 0 and %/% 2
to see how many 2s go into it, effectively grouping by pairs.
Upvotes: 1