Reputation: 6304
I have a data.frame
which maps an id
column to a group
column, and the id
column is not unique because the same id
can map to multiple group
s:
set.seed(1)
df <- data.frame(id = paste0("id", sample(1:10,300,replace = T)), group = c(rep("A",100), rep("B",100), rep("C",100)), stringsAsFactors = F)
I'd like to convert this data.frame
into a list
where each element is the id
s in each group
.
This seems a bit slow for the size of data I'm working with:
library(dplyr)
df.list <- lapply(unique(df$group), function(g) dplyr::filter(df, group == g)$id)
So I was thinking about this:
df.list <- df %>%
dplyr::group_by(group) %>%
dplyr::group_split()
Assuming it is faster than my first option, any idea how to get it to return the same output as in the first option rather than a list of data.frame
s?
Upvotes: 6
Views: 1279
Reputation: 887028
Using base R
only with split
. It should be faster than the ==
with unique
with(df, split(id, group))
Or with tidyverse
we can pull
the column after the group_split
. The group_split
returns a data.frame/tibble and could be slower compared to the split
only method above. But, here, we can make some performance improvements by removing the group column (keep = FALSE
) and then in the list
, pull
the 'id' column to create the list
of vector
s
library(dplyr)
library(purrr)
df %>%
group_split(group, keep = FALSE) %>%
map(~ .x %>%
pull(id))
Or use {}
with pipe
df %>%
{split(.$id, .$group)}
Or wrap with with
df %>%
with(., split(id, group))
Upvotes: 7