Reputation: 1304
How can I both split and duplicate dataframe using dplyr
? Just imagine, that I have got a data frame with grouping variable (i.e. group
), sample id (i.e. sample
) and value.
library(tidyverse)
df <- tibble(group = c(rep(LETTERS[1:3], 3), "mix", "mix"),
sample = paste0("sample", seq(1, 11)),
value = rnorm(11, 20, sd = 30))
I need to split this dataframe into two more dataframes by mix group and sample from this group. First group will be all dataframe without sample11
row, second — without sample10
row. Something like this but more modern way. I believe there's a function for this)
list(
df1 = df %>% filter(sample != "sample10"),
df2 = df %>% filter(sample != "sample11")
)
I need to do it for a tens of target samples and then map a function for every df.
Upvotes: 1
Views: 173
Reputation: 11
You can use ddply()
in plyr
package ("dd" is for "data frame to data frame")
my_list <- df %>% dlply("sample")
Upvotes: 1
Reputation: 35604
Try this
lapply(which(df$group == "mix"), function(x) df[-x, ])
To a pipe form
df %>%
{ which(.$group == "mix") } %>%
map(~ df[-., ])
Upvotes: 2
Reputation: 16871
To repeat the filtering for all sample labels, I'd take the unique sample values, map along that, and filter to exclude each one.
library(dplyr)
df_list <- unique(df$sample) %>%
purrr::map(~filter(df, sample != .))
df_list[1]
#> [[1]]
#> # A tibble: 10 x 3
#> group sample value
#> <chr> <chr> <dbl>
#> 1 B sample2 -7.49
#> 2 C sample3 34.1
#> 3 A sample4 61.4
#> 4 B sample5 51.9
#> 5 C sample6 15.7
#> 6 A sample7 -20.6
#> 7 B sample8 39.8
#> 8 C sample9 47.6
#> 9 mix sample10 37.3
#> 10 mix sample11 14.4
Better yet, name the data frames to show which sample was excluded:
df_list_named <- unique(df$sample) %>%
purrr::set_names(paste, "excluded", sep = "_") %>%
purrr::map(~filter(df, sample != .))
df_list_named[1]
#> $sample1_excluded
#> # A tibble: 10 x 3
#> group sample value
#> <chr> <chr> <dbl>
#> 1 B sample2 -7.49
#> 2 C sample3 34.1
#> 3 A sample4 61.4
#> 4 B sample5 51.9
#> 5 C sample6 15.7
#> 6 A sample7 -20.6
#> 7 B sample8 39.8
#> 8 C sample9 47.6
#> 9 mix sample10 37.3
#> 10 mix sample11 14.4
From there, call another map
or whatever to apply further functions.
Upvotes: 0
Reputation: 51592
You can try,
lapply(c('sample10', 'sample11'), function(i)df[!df$sample %in% i,])
Upvotes: 1