Reputation: 274
I need to generate and save multiple files from the randomization of a data frame. The original data frames are daily weather data for several years. I need to generate files that are random reorganizations of the years but keeping the year sequence.
I have developed a simple code for randomizing years, but I am having trouble to repeat randomization and save each output randomized data frame as a separate file.
This is what I have thus far:
# Create example data frame
df <- data.frame(x=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,8,8))
df$y <- c(4,8,9,1,1,5,8,8,3,2,0,9,4,4,7,3,5,5,2,4,6,6)
df$z <- c("A","A","A","B","B","B","C","C","C","D","D","D","F","F","F","G","G","G","H","H","I","I")
set.seed(30)
# Split data frame based on info in one column (i.e. df$x) and store in a list
dt_list <- split(df, f = df$x)
# RANDOMIZE data list -- Create a new index and change the order of dt_list
# SAVE the result to "random list" (i.e. 'rd_list')
rd_list <- dt_list[sample(1:length(dt_list), length(dt_list))]
# Put back together data in the order established in 'rd_list'
rd_data <- do.call(rbind, rd_list)
This randomizes the data frame just as I need, but I don't know how to "save & repeat" so I get multiple files, let's say about 20, named as the original and a sequential numeration (e.g. df_1, df_2 ...).
Also, being random samples, it's possible to get repetitions. Is there any way to automatically discard repeated files?
Thanks!
Upvotes: 0
Views: 261
Reputation: 6277
Here's an approach that makes use of a while
loop and the handy sample_n()
function from the dplyr
package, which samples a specified number of rows from a data frame (with or without replacement).
library(dplyr)
# Create the data
weather_data <- data.frame(Weather = c("Sunny", "Cloudy", "Rainy", "Sunny"),
Temperature = c(75, 68, 71, 76))
# Twenty times, repeatedly sample rows from the data and write to a csv file
total_files <- 20
df_index <- 1
while (df_index <= total_files) {
# Get a sample of the data
sampled_subset <- sample_n(weather_data,
size = 10,
replace = TRUE)
# Write the data to a csv file
filename_to_use <- paste0("Sample_Data", "_", df_index, ".csv")
write.csv(x = sampled_subset,
file = filename_to_use, sep = ",")
df_index <- df_index + 1
}
Upvotes: 2