mkmor
mkmor

Reputation: 13

R: splitting dataframe into distinct subgroups containing sequence of groups

This question is similar to one already answered: R: Splitting dataframe into subgroups consisting of every consecutive 2 groups

However, rather than splitting into subgroups that have a type in common, I need to split into subgroups that contain two consecutive types and are distinct. The groups in my actual data have differing numbers of rows as well.

df <- data.frame(ID=c('1','1','1','1','1','1','1'), Type=c('a','a','b','c','c','d','d'), value=c(10,2,5,3,7,3,9))

   ID Type value
1  1    a    10
2  1    a     2
3  1    b     5
4  1    c     3
5  1    c     7
6  1    d     3
7  1    d     9

So subgroup 1 would be Type a and b:

   ID Type value
1  1    a    10
2  1    a     2
3  1    b     5

And subgroup 2 would be Type c and d:

   ID Type value
4  1    c     3
5  1    c     7
6  1    d     3
7  1    d     9

I have tried manipulating the code from this previous example, but I can't figure out how to make this happen without having overlapping Types in each group. Any help would be greatly appreciated - thanks!

EDIT: thanks for pointing out I didn't actually include the correct link.

Upvotes: 1

Views: 127

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76460

Here is a rle way, written as a function. Pass the data.frame and the split column name as a character string.

df <- data.frame(ID=c('1','1','1','1','1','1','1'), 
                 Type=c('a','a','b','c','c','d','d'), 
                 value=c(10,2,5,3,7,3,9))

split_two <- function(x, col) {
  r <- rle(x[[col]])
  r$values[c(FALSE, TRUE)] <- r$values[c(TRUE, FALSE)]
  split(x, inverse.rle(r))
}
split_two(df, "Type")
#> $a
#>   ID Type value
#> 1  1    a    10
#> 2  1    a     2
#> 3  1    b     5
#> 
#> $c
#>   ID Type value
#> 4  1    c     3
#> 5  1    c     7
#> 6  1    d     3
#> 7  1    d     9

Created on 2023-02-09 with reprex v2.0.2

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145805

We can do a little manipulation of a dense_rank of the Type variable to make an appropriate grouping variable:

library(dplyr)
df %>%
  group_by(g = (dense_rank(match(Type, Type)) - 1) %/% 2) %>%
  group_split()
  
# [[1]]
# # A tibble: 3 × 4
#   ID    Type  value     g
#   <chr> <chr> <dbl> <dbl>
# 1 1     a        10     0
# 2 1     a         2     0
# 3 1     b         5     0
# 
# [[2]]
# # A tibble: 4 × 4
#   ID    Type  value     g
#   <chr> <chr> <dbl> <dbl>
# 1 1     c         3     1
# 2 1     c         7     1
# 3 1     d         3     1
# 4 1     d         9     1

Explanation: match(Type, Type) converts Type into integers ordered by number of appearance - but not dense. dense_rank() makes that dense (no gaps). We then subtract 1 to make it start at 0 and %/% 2 to see how many 2s go into it, effectively grouping by pairs.

Upvotes: 1

Related Questions