merv
merv

Reputation: 76700

rbind nested tibbles within groups

I have a tibble with a list column of tibbles (with compatible columns). I would like to rbind the tibbles after grouping. Here is a simplified example, where I would like to group on the tpm column:

library(tidyverse)

df_ex <- structure(list(
    tpm = c(3, 3, 5, 5),
    strand = c("negative", "positive", "negative", "positive"),
    sites = list(
        structure(list(chr = c("1", "1"), pos = c(30214L, 31109L), 
                       cov = c(7L, 14L), strand = c("-", "-")), 
                  row.names = c(NA, -2L), 
                  class = c("tbl_df", "tbl", "data.frame")), 
        structure(list(chr = c("1", "1"), pos = c(14362L, 14406L), 
                       cov = c(130L, 5490L), strand = c("+", "+")), 
                  row.names = c(NA, -2L), 
                  class = c("tbl_df", "tbl", "data.frame")), 
        structure(list(chr = c("1", "1"), pos = c(96976L, 98430L), 
                       cov = c(185L,3L), strand = c("-", "-")), 
                  row.names = c(NA, -2L), 
                  class = c("tbl_df", "tbl", "data.frame")), 
        structure(list(chr = c("1", "1"), pos = c(14358L, 14406L), 
                       cov = c(24L, 5246L), strand = c("+", "+")), 
                  row.names = c(NA, -2L), 
                  class = c("tbl_df", "tbl", "data.frame")))), 
    row.names = c(NA, -4L), 
    class = c("tbl_df", "tbl", "data.frame"))

df_ex
##  A tibble: 4 × 3
#     tpm strand   sites           
#   <dbl> <chr>    <list>          
# 1     3 negative <tibble [2 × 4]>
# 2     3 positive <tibble [2 × 4]>
# 3     5 negative <tibble [2 × 4]>
# 4     5 positive <tibble [2 × 4]>

I have tried the following:

df_ex %>%
  group_by(tpm) %>%
  transmute(sites=do.call(rbind, sites))

which gives the error

Error in `transmute()`:
! Problem while computing `sites = do.call(rbind, sites)`.
✖ `sites` must be size 2 or 1, not 4.
ℹ The error occurred in group 1: tpm = 3.
Run `rlang::last_error()` to see where the error occurred.

I have also tried using summarize:

df_ex %>% 
    group_by(tpm) %>%
    summarize(sites=do.call(rbind, sites), .groups='drop')

but this leads to expanding the nested tibbles:

# A tibble: 8 × 2
    tpm sites$chr  $pos  $cov $strand
  <dbl> <chr>     <int> <int> <chr>  
1     3 1         30214     7 -      
2     3 1         31109    14 -      
3     3 1         14362   130 +      
4     3 1         14406  5490 +      
5     5 1         96976   185 -      
6     5 1         98430     3 -      
7     5 1         14358    24 +      
8     5 1         14406  5246 + 

Instead, I want a result like:

##  A tibble: 4 × 2
#     tpm sites           
#   <dbl> <list>          
# 1     3 <tibble [4 × 4]>
# 2     5 <tibble [4 × 4]>

What is an idiomatic way to do this?

Upvotes: 1

Views: 94

Answers (1)

akrun
akrun

Reputation: 886938

Wrap with list in summarise

library(dplyr)
df_ex %>%
   group_by(tpm) %>%
   summarise(sites = list(bind_rows(sites)), .groups = 'drop')

-output

# A tibble: 2 × 2
    tpm sites           
  <dbl> <list>          
1     3 <tibble [4 × 4]>
2     5 <tibble [4 × 4]>

NOTE: Using rbind from base R can lead to some buggy situations i.e. the list elements doesn't have the same column names, whereas bind_rows can create NA for those columns that are not existing in one of the list elements

Upvotes: 2

Related Questions