DonC
DonC

Reputation: 483

Loop in R to read many files

I have been wondering if anybody knows a way to create a loop that loads files/databases in R. Say i have some files like that: data1.csv, data2.csv,..., data100.csv.

In some programming languages you one can do something like this data +{ x }+ .csv the system recognizes it like datax.csv, and then you can apply the loop.

Any ideas?

Upvotes: 45

Views: 120642

Answers (9)

Maël
Maël

Reputation: 52329

Here's another solution using a for loop. I like it better than the others because of its flexibility and because all dataframes are directly stored in the global environment.

Assume you've already set your working directory, the algorithm will iteratively read all files and store them in the global environment with the name "datai".

for (i in 1:100) {
  filename <- paste0("data", i)
  wd <- paste0("data", i, ".csv")
  assign(filename, read.csv(wd))
}

Upvotes: 2

CDX
CDX

Reputation: 304

fi <- list.files(directory_path,full.names=T)
dat <- lapply(fi,read.csv)

dat will contain the datasets in a list

Upvotes: 5

meklit chernet
meklit chernet

Reputation: 11

  1. First, set the working directory.
  2. Find and store all the files ending with .csv.
  3. Bind all of them row-wise.

Following is the code sample:

setwd("C:/yourpath")
temp <- list.files(pattern = "*.csv")
allData <- do.call("rbind",lapply(Sys.glob(temp), read.csv))

Upvotes: 0

epo3
epo3

Reputation: 3121

Let's assume that your files have the file format that you mentioned in your question and that they are located in the working directory.

You can vectorise creation of the file names if they have a simple naming structure. Then apply a loading function on all the files (here I used purrr package, but you can also use lapply)

library(purrr)
c(1:100) %>% paste0("data", ., ".csv") %>% map(read.csv)

Upvotes: 2

SDahm
SDahm

Reputation: 436

This may be helpful if you have datasets for participants as in psychology/sports/medicine etc.

setwd("C:/yourpath")

temp <- list.files(pattern = "*.sav")

#Maybe you want to unselect /delete IDs
DEL <- grep('ID(04|08|11|13|19).sav', temp)
temp2 <- temp[-DEL]

#Make a list of that contains all data
read.all <- lapply(temp2, read_sav)
#View(read.all[1])

#Option 1: put one under the next
df <- do.call("rbind", read.all)

Option 2: make something within each dataset (single IDs) e.g. get the mean of certain parts of each participant

mw_extraktion <- function(data_raw){
  data_raw <- data.frame(data_raw)
  #you may now calculate e.g. the mean for a certain variable for each ID
  ID <- data_raw$ID[1]
  data_OneID <- c(ID, Var2, Var3) #put your new variables (e.g. Means) here
} #end of function   
data_combined <- t(data.frame(sapply(read.all, mw_extraktion) ) )

Upvotes: -1

Aadhya Manu Anand
Aadhya Manu Anand

Reputation: 87

Read the headers in a file so that we can use them for replacing in merged file

library(dplyr)
library(readr)

list_file <- list.files(pattern = "*.csv") %>% 
  lapply(read.csv, stringsAsFactors=F) %>% 
   bind_rows 

Upvotes: 7

PAC
PAC

Reputation: 5376

I would put all the CSV files in a directory, create a list and do a loop to read all the csv files from the directory in the list.

setwd("~/Documents/")
ldf <- list() # creates a list
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
for (k in 1:length(listcsv)){
 ldf[[k]] <- read.csv(listcsv[k])
}
str(ldf[[1]]) 

Upvotes: 11

Gavin Simpson
Gavin Simpson

Reputation: 174938

Sys.glob() is another possibility - it's sole purpose is globbing or wildcard expansion.

dataFiles <- lapply(Sys.glob("data*.csv"), read.csv)

That will read all the files of the form data[x].csv into list dataFiles, where [x] is nothing or anything.

[Note this is a different pattern to that in @Joshua's Answer. There, list.files() takes a regular expression, whereas Sys.glob() just uses standard wildcards; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob.]

Upvotes: 62

Joshua Ulrich
Joshua Ulrich

Reputation: 176718

See ?list.files.

myFiles <- list.files(pattern="data.*csv")

Then you can loop over myFiles.

Upvotes: 35

Related Questions