dan
dan

Reputation: 6304

Split a data.frame by group into a list of vectors rather than a list of data.frames

I have a data.frame which maps an id column to a group column, and the id column is not unique because the same id can map to multiple groups:

set.seed(1)
df <- data.frame(id = paste0("id", sample(1:10,300,replace = T)), group = c(rep("A",100), rep("B",100), rep("C",100)), stringsAsFactors = F)

I'd like to convert this data.frame into a list where each element is the ids in each group.

This seems a bit slow for the size of data I'm working with:

library(dplyr)
df.list <- lapply(unique(df$group), function(g) dplyr::filter(df, group == g)$id)

So I was thinking about this:

df.list <- df %>%
  dplyr::group_by(group) %>%
  dplyr::group_split()

Assuming it is faster than my first option, any idea how to get it to return the same output as in the first option rather than a list of data.frames?

Upvotes: 6

Views: 1279

Answers (1)

akrun
akrun

Reputation: 887028

Using base R only with split. It should be faster than the == with unique

with(df, split(id, group))

Or with tidyverse we can pull the column after the group_split. The group_split returns a data.frame/tibble and could be slower compared to the split only method above. But, here, we can make some performance improvements by removing the group column (keep = FALSE) and then in the list, pull the 'id' column to create the list of vectors

library(dplyr)
library(purrr)
df %>%
     group_split(group, keep = FALSE) %>% 
     map(~ .x %>%
             pull(id))

Or use {} with pipe

df %>%
    {split(.$id, .$group)}

Or wrap with with

df %>%
     with(., split(id, group))

Upvotes: 7

Related Questions