atsyplenkov
atsyplenkov

Reputation: 1304

Split by variable in a specific group using dplyr

How can I both split and duplicate dataframe using dplyr? Just imagine, that I have got a data frame with grouping variable (i.e. group), sample id (i.e. sample) and value.

library(tidyverse)

df <- tibble(group = c(rep(LETTERS[1:3], 3), "mix", "mix"),
       sample = paste0("sample", seq(1, 11)),
       value = rnorm(11, 20, sd = 30))

I need to split this dataframe into two more dataframes by mix group and sample from this group. First group will be all dataframe without sample11 row, second — without sample10 row. Something like this but more modern way. I believe there's a function for this)

list(
  df1 = df %>% filter(sample != "sample10"),
  df2 = df %>% filter(sample != "sample11")
)

I need to do it for a tens of target samples and then map a function for every df.

Upvotes: 1

Views: 173

Answers (4)

Aim&#233; Okoko
Aim&#233; Okoko

Reputation: 11

You can use ddply() in plyr package ("dd" is for "data frame to data frame")

my_list <- df %>% dlply("sample")

Upvotes: 1

Darren Tsai
Darren Tsai

Reputation: 35604

Try this

lapply(which(df$group == "mix"), function(x) df[-x, ])

To a pipe form

df %>%
  { which(.$group == "mix") } %>%
  map(~ df[-., ])

Upvotes: 2

camille
camille

Reputation: 16871

To repeat the filtering for all sample labels, I'd take the unique sample values, map along that, and filter to exclude each one.

library(dplyr)

df_list <- unique(df$sample) %>%
  purrr::map(~filter(df, sample != .))
df_list[1]
#> [[1]]
#> # A tibble: 10 x 3
#>    group sample    value
#>    <chr> <chr>     <dbl>
#>  1 B     sample2   -7.49
#>  2 C     sample3   34.1 
#>  3 A     sample4   61.4 
#>  4 B     sample5   51.9 
#>  5 C     sample6   15.7 
#>  6 A     sample7  -20.6 
#>  7 B     sample8   39.8 
#>  8 C     sample9   47.6 
#>  9 mix   sample10  37.3 
#> 10 mix   sample11  14.4

Better yet, name the data frames to show which sample was excluded:

df_list_named <- unique(df$sample) %>%
  purrr::set_names(paste, "excluded", sep = "_") %>%
  purrr::map(~filter(df, sample != .))
df_list_named[1]
#> $sample1_excluded
#> # A tibble: 10 x 3
#>    group sample    value
#>    <chr> <chr>     <dbl>
#>  1 B     sample2   -7.49
#>  2 C     sample3   34.1 
#>  3 A     sample4   61.4 
#>  4 B     sample5   51.9 
#>  5 C     sample6   15.7 
#>  6 A     sample7  -20.6 
#>  7 B     sample8   39.8 
#>  8 C     sample9   47.6 
#>  9 mix   sample10  37.3 
#> 10 mix   sample11  14.4

From there, call another map or whatever to apply further functions.

Upvotes: 0

Sotos
Sotos

Reputation: 51592

You can try,

lapply(c('sample10', 'sample11'), function(i)df[!df$sample %in% i,])

Upvotes: 1

Related Questions