Split dataframe into certain number of groups in R

Question

I have a dataframe with 285000 records and I want to split it in 10 dataframes that I could save and access easily. I am trying to split it like this but I am not sure how to save all dataframes separately:

groups <- c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10")
X <- split(data5, f = groups)

Like this I only receive one subset dataframe.

Serkan · Accepted Answer

If you want to split your data and save it seperately, I would recommend the following approach using tidyverse.

Split the data

# libraries;
library(tidyverse)
library(data.table)

# split data according to some
# variable and store

data_list <- mtcars %>% split(
        f = .$cyl
) %>% set_names(
        nm = paste("cylinder", names(.), sep = "")
)

Here, f = .$cyl refers to your grouping variable in the dataset of interest. In this example Ive split the data according to cyl in mtcars.

The function splits according to each level inside the data. In this case 4, 6 and 8 cylinders.

I proceed with set_names from purrr to name each element of the list accordingly.

Saving the data

# store and save locally
# by using map

map(
        .x = 1:length(data_list),
        .f = function(i) {
                
                # set name of data to save locally
                path <- paste(names(data_list[i]), ".csv", sep = "")
                
                # save with fwrite
                fwrite(
                        data_list[[i]],
                        file = path,
                        sep  = ";"
                )
                
                
        }
)

I use map to iterate through the entire length of the list which split creates, and save them locally according to the names we set above with fwrite from data.table for better performance.

Note that in the script each data is saves as paste(names(data_list[i]), ".csv", sep = ""), which evaluates to cylinder4.csv, cylinder6.csv and cylinder8.csv.

The same approach to your data should be readily applicable with minor changes in the script.

Best

Split dataframe into certain number of groups in R

Answers (2)

Related Questions