Reputation: 3195
I have folder with many datasets
C:/path/folder
folder has subfolders
/1
/2
/3
...
Each subfolders has 1-20 csv files.
So i need all csv from subfoldes of folder merge into one csv file, but each observation must have mark from what subfolder it.
Example if i merge csv files from subfolder 1 and subfolder 2 i get
newdata=structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "02.01.2018", class = "factor"),
Revenue = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 6.25, 5.92,
6.25, 5.92, 6.25, 5.92, 5.92, 5.92, 6.25, 6.25, 6.25, 5.92,
6.25, 6.25, 5.92, 5.92, 5.92, 6.25, 5.92)), .Names = c("Date",
"Revenue", "Budget"), class = "data.frame", row.names = c(NA,
-20L))
this is a little wrong, I need to assign number subfolders to the observations,from they come. So output
Date Revenue Budget subfolder
02.01.2018 0 6,25 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 1
02.01.2018 0 5,92 1
02.01.2018 0 5,92 1
02.01.2018 0 6,25 1
02.01.2018 0 6,25 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 2
02.01.2018 0 6,25 2
02.01.2018 0 6,25 2
02.01.2018 0 5,92 2
02.01.2018 0 5,92 2
02.01.2018 0 5,92 2
02.01.2018 0 6,25 2
02.01.2018 0 5,92 2
so obs from 1:12 was taken subfolder 1 and obs. from 13:20 was taken from subfolder 2
separate subfolder 1
C:/path/folder/subfolder1
f1=structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "02.01.2018", class = "factor"), Revenue = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 6.25,
5.92, 6.25, 5.92, 6.25, 5.92, 5.92, 5.92, 6.25, 6.25)), .Names = c("Date",
"Revenue", "Budget"), class = "data.frame", row.names = c(NA,
-11L))
C:/path/folder/subfolder2
f2=
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "02.01.2018", class = "factor"), Revenue = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 5.92, 6.25,
6.25, 5.92, 5.92, 5.92, 6.25, 5.92)), .Names = c("Date", "Revenue",
"Budget"), class = "data.frame", row.names = c(NA, -9L))
Upvotes: 0
Views: 83
Reputation: 590
Imagine you have the following folder structure:
master
|
+-- folder1
|
+-- file1.csv
+-- file2.csv
+-- folder2
|
+-- file1.csv
+-- file2.csv
and your working directory is "master", then you can do the following:
# this filters out all non-files (directories) in master
dirs <- list.files()[!grepl("[.]", list.files())]
# This creates the dataframe that will be filled
all_data <- data.frame(Date = character(),
Revenue = integer(),
Budget = numeric(),
dirname = character())
# Loops over directories
for (dirname in dirs) {
# Get all csv files
all_csv <- list.files()[grepl(".csv", list.files())]
# Loops over files in the directory
for (file in all_csv) {
tempdata <- read.table(file, stringsAsFactors = FALSE, header = TRUE)
tempdata$dirname <- dirname
all_data <- rbind(all_data, tempdata)
}
}
Upvotes: 1