user2783615
user2783615

Reputation: 839

Read multiple files into separate data frames and process every dataframe

for all the files in one directory, I want to read each file into a data frame then process the file, for example, calculate cor across columns. For example:

files<-list.files(path=".") <br>
names <- substr(files,18,20)

for(i in c(1:length(names))){
     name <- names[i]    
     assign (name, read.table(files[i]))
     sapply(3:ncol(name), function(y) cor(name[, 2], name[, y], ))      
}

but 'name' is a string in the last statement of the code, how can I process the dataframe 'name'?

Upvotes: 1

Views: 3804

Answers (2)

coanil
coanil

Reputation: 272

The way to do this is to put all the files you wish to read in in one folder, and then work with lists:

your.dir <- ""  # adjust
files <- list.files(your.dir)

your.dfs <- lapply(file.path(your.dir, files), read.table)

your.dfsis now a list holding all your data frames. You can perform functions on all data frames simultaneously using lapply, or you can access individual data frames with the usual subsetting syntax, for example your.dfs[[1]] to access the first data frame.

Upvotes: 0

Jake Burkhead
Jake Burkhead

Reputation: 6535

This is exactly what R's lists are for. Also calling sapply to get all of the correlations is unnecessary since cor returns the correlation matrix so you can just subset

R> files <- list.files(pattern = "tsv")
R> dat <- lapply(files, read.table)
R> dat
[[1]]
          a        b  c
1  2.802164 4.835557  6
2  1.680186 4.974198  3
3  3.002777 4.670041  6
4  2.182691 5.137982 11
5  4.206979 5.170269  5
6  1.307195 4.753041  9
7  2.919497 4.657171  7
8  2.938614 5.305558  9
9  2.575200 4.893604  2
10 1.548161 4.871108  4

[[2]]
            a b  c
1  -1.8483890 2  6
2  -2.9035164 0  7
3  -0.6490283 1  6
4  -2.8842633 3  2
5  -1.8803775 0 12
6  -3.0267870 1  9
7   0.5287124 0  7
8  -3.7220733 0  2
9  -2.0663912 2  9
10 -1.6232248 1  6

You can then lapply over this list again to process or do it as a one liner.

R> dat <- lapply(files, function(x) cor(read.table(x))[1,-1] )
R> dat
[[1]]
          b           c 
 0.27236143 -0.04973541 

[[2]]
         b          c 
-0.1440812  0.2771511 

Upvotes: 3

Related Questions