mxttgen31
mxttgen31

Reputation: 123

Loop through list of file names in a file R

list.files() can help find files in a directory, but how can I loop through a list of files already in text file? Where all_my_files.txt lists the path to each file one per row:

file.txt
file2.txt
file3.txt

library(data.table)

files<- read.csv(all_my_files.txt)
for (i in 1:length(files))
{
df<-fread(files[i])
x<-mean(df$V1)
}

Upvotes: 0

Views: 2326

Answers (2)

DaveTurek
DaveTurek

Reputation: 1297

You can use lapply to loop through your file names.

I use iris like @bs93 but split into 3 separate data.frames.

iris1=iris[1:50,]   
iris2=iris[51:100,] 
iris3=iris[101:150,]

# write them to text files
write.table(iris2,file="iris2.txt",row.names=FALSE)
write.table(iris3,file="iris3.txt",row.names=FALSE)
write.table(iris1,file="iris1.txt",row.names=FALSE)

# create the text file containing the filenames
filenames <- paste0("iris", 1:3, ".txt")
writeLines(filenames,"filenames.txt")

# Now solve the problem
# read the filenames into a character vector
fn <- readLines("filenames.txt")

# apply `read.table` over that vector of filenames
Ilist <- lapply(fn,read.table,header=TRUE)

# Ilist is a list containing 3 data.frames
str(Ilist)

# Get the mean Sepal.Length from each data.frame in Ilist
x <- sapply(Ilist,function(z) mean(z$Sepal.Length))
x

# if you want to use `data.table` 
library(data.table)

# then you can use `fread` instead of `read.table`
Ilist <- lapply(fn,fread)

# Then Ilist will be a list of 3 data.tables

Upvotes: 1

bs93
bs93

Reputation: 1316

Here is a small example and to make it reproducible we will use the built-in iris data set and save it 3 times to our working directory with filenames 'iris1.csv', 'iris2.csv', and 'iris3.csv'. Additionally, we can also save the relative paths to the file as well to a .txt file called 'all_my_files.txt' (also just 'iris1.csv', 'iris2.csv', and 'iris3.csv'). We can then read the file paths back in from the 'all_my_files.txt' and subsequently read the data associated with them.

data.table + loop solution

library(data.table)
library(tidyverse)

#make filenames
filenames <- paste0("iris", 1:3, ".csv")

#save iris dataset three time naming them 'iris1.csv', 'iris2.csv' etc
walk(filenames, ~write_csv(iris, path = .x))

#save the filepath
writeLines(filenames, "all_my_files.txt")

#read all the filepaths back in from text file
get_filenames_from_file <- readLines("all_my_files.txt")

files <- list()
mean_v1 <- vector()
for (i in 1:length(get_filenames_from_file)){
  dat <-fread(get_filenames_from_file[[i]])
  files[[i]] <- dat
  #get mean of a column 
  mean_v1[i] <- mean(dat$Sepal.Length) 
}

Full tidyverse solution:

library(tidyverse)

#make filenames
filenames <- paste0("iris", 1:3, ".csv")

#save iris dataset three time naming them 'iris1.csv', 'iris2.csv' etc
walk(filenames, ~write_csv(iris, path = .x))

#save the filepath
writeLines(filenames, "all_my_files.txt")

#read all the filepaths back in from text file
get_filenames_from_file <- readLines("all_my_files.txt")

#read the data in from the filepaths
data <- map(get_filenames_from_file, read_csv)

Either case we know have a list of 3 iris data frames:

str(data)
List of 3
 $ : tibble [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   Sepal.Length = col_double(),
  .. ..   Sepal.Width = col_double(),
  .. ..   Petal.Length = col_double(),
  .. ..   Petal.Width = col_double(),
  .. ..   Species = col_character()
  .. .. )
 $ : tibble [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   Sepal.Length = col_double(),
  .. ..   Sepal.Width = col_double(),
  .. ..   Petal.Length = col_double(),
  .. ..   Petal.Width = col_double(),
  .. ..   Species = col_character()
  .. .. )
 $ : tibble [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
  ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   Sepal.Length = col_double(),
  .. ..   Sepal.Width = col_double(),
  .. ..   Petal.Length = col_double(),
  .. ..   Petal.Width = col_double(),
  .. ..   Species = col_character()
  .. .. )

Upvotes: 0

Related Questions