Reputation: 329
I have a dataframe that needs to be split into individual files based on the value of a variable in the dataframe. There are scores of individuals and confidential information in the dataframe, thus a simplified example is below. I want the split to be based on the variable "first".
first <- c("Jon", "Bill", "Bill" , "Maria", "Ben", "Tina")
age <- c(23, 41, 41 , 32, 58, 26)
df <- data.frame(first , age)
df
For example, I want the file with Jon to have one line and the file with Bill to have two lines. I've attempted the following but I'm stuck. I don't know how to get individual dataframes from the list df.split.
library(tidyverse)
df.grped <-
df %>%
group_by(first)
df.split <-
group_split(df.grped)
So I would like to have the files: df.split_Jon, df.split_Bill, df.split_Maria, etc. The actual source file is large so I don't want to specify each.
Since I understand working in tidyverse the best I'd like to have the solution there, if possible. Thanks for any help!!
Upvotes: 4
Views: 776
Reputation: 18551
Here is another option using {purrr} and {rlang}. We first split the data.frame
with dplyr::group_split
, then name it with purrr::set_names
and map_chr
, and then assign it with rlang::env_bind
in which we can splice it using !!!
:
library(tidyverse)
df %>%
group_split(first) %>%
set_names(map_chr(., ~ paste0("df_", .$first[[1]]))) %>%
rlang::env_bind(.GlobalEnv, !!! .)
ls()
#> [1] "age" "df" "df_Ben" "df_Bill" "df_Jon" "df_Maria" "df_Tina"
#> [8] "first"
Created on 2022-01-02 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 7106
Another alternative:
library(tidyverse)
df %>%
group_split(first) %>%
walk(~ assign(str_c("df.split_", .[1, 1]), value = ., envir = .GlobalEnv))
names(.GlobalEnv)
#> [1] "df.split_Bill" "first" "df.split_Maria" "df.split_Ben"
#> [5] "df.split_Tina" "age" "df.split_Jon" "df"
Created on 2022-01-01 by the reprex package (v2.0.1)
Upvotes: 2
Reputation: 21908
After splitting the data set by the unique values of the first column, we make use of list2env
function to create a separated dataframe
of each subset into the global environment as follows:
library(tidyverse)
setNames(df %>%
group_split(first), paste0("df.split_", unique(df$first))) %>%
list2env(envir = globalenv())
Upvotes: 5