Reputation: 76700
I have a tibble with a list column of tibbles (with compatible columns). I would like to rbind
the tibbles after grouping. Here is a simplified example, where I would like to group on the tpm
column:
library(tidyverse)
df_ex <- structure(list(
tpm = c(3, 3, 5, 5),
strand = c("negative", "positive", "negative", "positive"),
sites = list(
structure(list(chr = c("1", "1"), pos = c(30214L, 31109L),
cov = c(7L, 14L), strand = c("-", "-")),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame")),
structure(list(chr = c("1", "1"), pos = c(14362L, 14406L),
cov = c(130L, 5490L), strand = c("+", "+")),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame")),
structure(list(chr = c("1", "1"), pos = c(96976L, 98430L),
cov = c(185L,3L), strand = c("-", "-")),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame")),
structure(list(chr = c("1", "1"), pos = c(14358L, 14406L),
cov = c(24L, 5246L), strand = c("+", "+")),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame")))),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame"))
df_ex
## A tibble: 4 × 3
# tpm strand sites
# <dbl> <chr> <list>
# 1 3 negative <tibble [2 × 4]>
# 2 3 positive <tibble [2 × 4]>
# 3 5 negative <tibble [2 × 4]>
# 4 5 positive <tibble [2 × 4]>
I have tried the following:
df_ex %>%
group_by(tpm) %>%
transmute(sites=do.call(rbind, sites))
which gives the error
Error in `transmute()`:
! Problem while computing `sites = do.call(rbind, sites)`.
✖ `sites` must be size 2 or 1, not 4.
ℹ The error occurred in group 1: tpm = 3.
Run `rlang::last_error()` to see where the error occurred.
I have also tried using summarize
:
df_ex %>%
group_by(tpm) %>%
summarize(sites=do.call(rbind, sites), .groups='drop')
but this leads to expanding the nested tibbles:
# A tibble: 8 × 2
tpm sites$chr $pos $cov $strand
<dbl> <chr> <int> <int> <chr>
1 3 1 30214 7 -
2 3 1 31109 14 -
3 3 1 14362 130 +
4 3 1 14406 5490 +
5 5 1 96976 185 -
6 5 1 98430 3 -
7 5 1 14358 24 +
8 5 1 14406 5246 +
Instead, I want a result like:
## A tibble: 4 × 2
# tpm sites
# <dbl> <list>
# 1 3 <tibble [4 × 4]>
# 2 5 <tibble [4 × 4]>
What is an idiomatic way to do this?
Upvotes: 1
Views: 94
Reputation: 886938
Wrap with list
in summarise
library(dplyr)
df_ex %>%
group_by(tpm) %>%
summarise(sites = list(bind_rows(sites)), .groups = 'drop')
-output
# A tibble: 2 × 2
tpm sites
<dbl> <list>
1 3 <tibble [4 × 4]>
2 5 <tibble [4 × 4]>
NOTE: Using rbind
from base R
can lead to some buggy situations i.e. the list elements doesn't have the same column names, whereas bind_rows
can create NA
for those columns that are not existing in one of the list elements
Upvotes: 2