psysky
psysky

Reputation: 3195

merge datasets in 1 csv file with assign subfolder in R

I have folder with many datasets

C:/path/folder

folder has subfolders

/1
/2
/3
...

Each subfolders has 1-20 csv files.

So i need all csv from subfoldes of folder merge into one csv file, but each observation must have mark from what subfolder it.

Example if i merge csv files from subfolder 1 and subfolder 2 i get

newdata=structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "02.01.2018", class = "factor"), 
    Revenue = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 6.25, 5.92, 
    6.25, 5.92, 6.25, 5.92, 5.92, 5.92, 6.25, 6.25, 6.25, 5.92, 
    6.25, 6.25, 5.92, 5.92, 5.92, 6.25, 5.92)), .Names = c("Date", 
"Revenue", "Budget"), class = "data.frame", row.names = c(NA, 
-20L))

this is a little wrong, I need to assign number subfolders to the observations,from they come. So output

Date    Revenue Budget  subfolder
02.01.2018  0   6,25    1
02.01.2018  0   6,25    1
02.01.2018  0   5,92    1
02.01.2018  0   6,25    1
02.01.2018  0   5,92    1
02.01.2018  0   6,25    1
02.01.2018  0   5,92    1
02.01.2018  0   5,92    1
02.01.2018  0   5,92    1
02.01.2018  0   6,25    1
02.01.2018  0   6,25    1
02.01.2018  0   6,25    1
02.01.2018  0   5,92    2
02.01.2018  0   6,25    2
02.01.2018  0   6,25    2
02.01.2018  0   5,92    2
02.01.2018  0   5,92    2
02.01.2018  0   5,92    2
02.01.2018  0   6,25    2
02.01.2018  0   5,92    2

so obs from 1:12 was taken subfolder 1 and obs. from 13:20 was taken from subfolder 2

separate subfolder 1

C:/path/folder/subfolder1

f1=structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "02.01.2018", class = "factor"), Revenue = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 6.25, 
5.92, 6.25, 5.92, 6.25, 5.92, 5.92, 5.92, 6.25, 6.25)), .Names = c("Date", 
"Revenue", "Budget"), class = "data.frame", row.names = c(NA, 
-11L))

C:/path/folder/subfolder2

f2=

structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = "02.01.2018", class = "factor"), Revenue = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 5.92, 6.25, 
6.25, 5.92, 5.92, 5.92, 6.25, 5.92)), .Names = c("Date", "Revenue", 
"Budget"), class = "data.frame", row.names = c(NA, -9L))

Upvotes: 0

Views: 83

Answers (1)

Stanislaus Stadlmann
Stanislaus Stadlmann

Reputation: 590

Imagine you have the following folder structure:

master
 |
 +-- folder1
     | 
     +-- file1.csv
     +-- file2.csv
 +-- folder2
     |
     +-- file1.csv
     +-- file2.csv

and your working directory is "master", then you can do the following:

# this filters out all non-files (directories) in master
dirs <- list.files()[!grepl("[.]", list.files())]

# This creates the dataframe that will be filled
all_data <- data.frame(Date = character(),
                       Revenue = integer(),
                       Budget = numeric(),
                       dirname = character())

# Loops over directories
for (dirname in dirs) {
  # Get all csv files
  all_csv <- list.files()[grepl(".csv", list.files())]

  # Loops over files in the directory
  for (file in all_csv) {
    tempdata <- read.table(file, stringsAsFactors = FALSE, header = TRUE)
    tempdata$dirname <- dirname
    all_data <- rbind(all_data, tempdata)
  }
}

Upvotes: 1

Related Questions