Reputation: 483
I have been wondering if anybody knows a way to create a loop that loads files/databases in R. Say i have some files like that: data1.csv, data2.csv,..., data100.csv.
In some programming languages you one can do something like this data +{ x }+ .csv the system recognizes it like datax.csv, and then you can apply the loop.
Any ideas?
Upvotes: 45
Views: 120642
Reputation: 52329
Here's another solution using a for loop. I like it better than the others because of its flexibility and because all dataframes are directly stored in the global environment.
Assume you've already set your working directory, the algorithm will iteratively read all files and store them in the global environment with the name "datai".
for (i in 1:100) {
filename <- paste0("data", i)
wd <- paste0("data", i, ".csv")
assign(filename, read.csv(wd))
}
Upvotes: 2
Reputation: 304
fi <- list.files(directory_path,full.names=T)
dat <- lapply(fi,read.csv)
dat will contain the datasets in a list
Upvotes: 5
Reputation: 11
.csv
.Following is the code sample:
setwd("C:/yourpath")
temp <- list.files(pattern = "*.csv")
allData <- do.call("rbind",lapply(Sys.glob(temp), read.csv))
Upvotes: 0
Reputation: 3121
Let's assume that your files have the file format that you mentioned in your question and that they are located in the working directory.
You can vectorise creation of the file names if they have a simple naming structure. Then apply a loading function on all the files (here I used purrr package, but you can also use lapply
)
library(purrr)
c(1:100) %>% paste0("data", ., ".csv") %>% map(read.csv)
Upvotes: 2
Reputation: 436
This may be helpful if you have datasets for participants as in psychology/sports/medicine etc.
setwd("C:/yourpath")
temp <- list.files(pattern = "*.sav")
#Maybe you want to unselect /delete IDs
DEL <- grep('ID(04|08|11|13|19).sav', temp)
temp2 <- temp[-DEL]
#Make a list of that contains all data
read.all <- lapply(temp2, read_sav)
#View(read.all[1])
#Option 1: put one under the next
df <- do.call("rbind", read.all)
Option 2: make something within each dataset (single IDs) e.g. get the mean of certain parts of each participant
mw_extraktion <- function(data_raw){
data_raw <- data.frame(data_raw)
#you may now calculate e.g. the mean for a certain variable for each ID
ID <- data_raw$ID[1]
data_OneID <- c(ID, Var2, Var3) #put your new variables (e.g. Means) here
} #end of function
data_combined <- t(data.frame(sapply(read.all, mw_extraktion) ) )
Upvotes: -1
Reputation: 87
Read the headers in a file so that we can use them for replacing in merged file
library(dplyr)
library(readr)
list_file <- list.files(pattern = "*.csv") %>%
lapply(read.csv, stringsAsFactors=F) %>%
bind_rows
Upvotes: 7
Reputation: 5376
I would put all the CSV files in a directory, create a list and do a loop to read all the csv files from the directory in the list.
setwd("~/Documents/")
ldf <- list() # creates a list
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
for (k in 1:length(listcsv)){
ldf[[k]] <- read.csv(listcsv[k])
}
str(ldf[[1]])
Upvotes: 11
Reputation: 174938
Sys.glob()
is another possibility - it's sole purpose is globbing or wildcard expansion.
dataFiles <- lapply(Sys.glob("data*.csv"), read.csv)
That will read all the files of the form data[x].csv
into list dataFiles
, where [x]
is nothing or anything.
[Note this is a different pattern to that in @Joshua's Answer. There, list.files()
takes a regular expression, whereas Sys.glob()
just uses standard wildcards; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob
.]
Upvotes: 62
Reputation: 176718
See ?list.files
.
myFiles <- list.files(pattern="data.*csv")
Then you can loop over myFiles
.
Upvotes: 35