Reputation: 187
I have a dataframe with 285000 records and I want to split it in 10 dataframes that I could save and access easily. I am trying to split it like this but I am not sure how to save all dataframes separately:
groups <- c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10")
X <- split(data5, f = groups)
Like this I only receive one subset dataframe.
Upvotes: 0
Views: 1356
Reputation: 1955
If you want to split
your data
and save it seperately, I would recommend the following approach using tidyverse
.
Split the data
# libraries;
library(tidyverse)
library(data.table)
# split data according to some
# variable and store
data_list <- mtcars %>% split(
f = .$cyl
) %>% set_names(
nm = paste("cylinder", names(.), sep = "")
)
Here, f = .$cyl
refers to your grouping variable in the dataset of interest. In this example Ive split the data according to cyl
in mtcars
.
The function splits according to each level inside the data
. In this case 4, 6 and 8 cylinders.
I proceed with set_names
from purrr
to name each element of the list accordingly.
Saving the data
# store and save locally
# by using map
map(
.x = 1:length(data_list),
.f = function(i) {
# set name of data to save locally
path <- paste(names(data_list[i]), ".csv", sep = "")
# save with fwrite
fwrite(
data_list[[i]],
file = path,
sep = ";"
)
}
)
I use map
to iterate through the entire length of the list which split
creates, and save them locally according to the names we set above with fwrite
from data.table
for better performance.
Note that in the script each data is saves as paste(names(data_list[i]), ".csv", sep = "")
, which evaluates to cylinder4.csv
, cylinder6.csv
and cylinder8.csv
.
The same approach to your data should be readily applicable with minor changes in the script.
Best
Upvotes: 2
Reputation: 882
If you want to arbitrarily split a big dataframe into little ones, you can add to the dataframe a uniformly distributed grouping variable, then use split.
df <- data.frame(group = rep(1:3, 4),
val = runif(12))
df
group val
1 1 0.5883321
2 2 0.5704967
3 3 0.7866597
4 1 0.8685778
5 2 0.6580090
6 3 0.1036386
7 1 0.7858867
8 2 0.2679281
9 3 0.2577965
10 1 0.6040585
11 2 0.6987716
12 3 0.2328914
>
split(df, x$groupVal)
> $a
group val
2 2 0.5704967
5 2 0.6580090
8 2 0.2679281
11 2 0.6987716
$b
group val
1 1 0.5883321
4 1 0.8685778
7 1 0.7858867
10 1 0.6040585
$c
group val
3 3 0.7866597
6 3 0.1036386
9 3 0.2577965
12 3 0.2328914
Upvotes: 0