Emily
Emily

Reputation: 899

How to apply a function to every possible pairwise combination of files stored in a common directory

I have a directory containing a large number of csv files. I would like to load the data into R and apply a function to every possible pair combination of csv files in the directory, then write the output to file.

The function that I would like to apply is matchpt() from the biobase library which compares locations between two data frames.

Here is an example of what I would like to do (although I have many more files than this):

  1. Three files in directory: A, B and C
  2. Perform matchpt on each pairwise combination: nn1 = matchpt(A,B) nn2 = matchpt(A,C) nn3 = matchpt(B,C)
  3. Write nn1, nn2 and nn3 to csv file.

I have not been able to find any solutions for this yet and would appreciate any suggestions. I am really not sure where to go from here but I am assuming that some sort of nested for loop is required to somehow cycle sequentially through all pairwise combinations of files. Below is a beginning at something but this only compares the first file with all the others in the directory so does not work!

library("Biobase")

# create two lists of identical filenames stored in the directory:
filenames1 = list.files(path=dir, pattern="csv$", full.names=FALSE, recursive=FALSE)
filenames2 = list.files(path=dir, pattern="csv$", full.names=FALSE, recursive=FALSE)

for(i in 1:length(filenames2)){
# load the first data frame in list 1
  df1 <- lapply(filenames1[1], read.csv, header=TRUE, stringsAsFactors=FALSE)
  df1 <- data.frame(df1)
# load a second data frame from list 2
  df2 <- lapply(filenames2[i], read.csv, header=TRUE, stringsAsFactors=FALSE)
  df2 <- data.frame(df2)

# isolate the relevant columns from within the two data frames
dat1 <- as.matrix(df1[, c("lat", "long")]) 
dat2 <- as.matrix(df2[, c("lat", "long")])

# run the matchpt function on the two data frames
nn <- matchpt(dat1, dat2)

#Extract the unique id code in the two filenames (for naming the output file)
file1 = filenames1[1]
code1 = strsplit(file1,"_")[[1]][1]
file2 = filenames2[i]
code2 = strsplit(file2,"_")[[1]][1]
outname = paste(code1, code2, sep=”_”)
outfile = paste(code, "_nn.csv", sep="")
write.csv(nn, file=outname, row.names=FALSE)

}

Any suggestions on how to solve this problem would be greatly appreciated. Many thanks!

Upvotes: 1

Views: 1500

Answers (3)

zx8754
zx8754

Reputation: 56249

Try this example:

#dummy filenames
filenames <- paste0("file_",1:5,".txt")

#loop through unique combination
for(i in 1:(length(filenames)-1))
for(j in (i+1):length(filenames))
  {
  flush.console()
  print(paste("i=",i,"j=",j,"|","file1=",filenames[i],"file2=",filenames[j]))
}

Upvotes: 1

Emily
Emily

Reputation: 899

In response to my question I seem to have found a solution. The below uses a for loop to perform every pairwise combination of files in a common directory (this seems to work and gives EVERY combination of files i.e. A & B and B & A):

# create a list of filenames
filenames = list.files(path=dir, pattern="csv$", full.names=FALSE, recursive=FALSE)

# For loop to compare the files
for(i in 1:length(filenames)){

  # load the first data frame in the list
  df1 = lapply(filenames[i], read.csv, header=TRUE, stringsAsFactors=FALSE)
  df1 = data.frame(df1)
  file1 = filenames[i]
  code1 = strsplit(file1,"_")[[1]][1] # extract unique id code of file (in case where the id comes before an underscore)
  # isolate the columns of interest within the first data frame
  d1 <- as.matrix(df1[, c("lat_UTM", "long_UTM")]) 

  # load the comparison file
  for (j in 1:length(filenames)){

    # load the second data frame in the list
    df2 = lapply(filenames[j], read.csv, header=TRUE, stringsAsFactors=FALSE)
    df2 = data.frame(df2)
    file2 = filenames[j]
    code2 = strsplit(file2,"_")[[1]][1] # extract uniqe id code of file 2 
    # isolate the columns of interest within the second data frame
    d2 <- as.matrix(df2[, c("lat_UTM", "long_UTM")])

  # run the comparison function on the two data frames (in this case matchpt)
    out <- matchpt(d1, d2)
  # Merge the unique id code in the two filenames (for naming the output file)
    outname = paste(code1, code2, sep="_")
    outfile = paste(outname, "_out.csv", sep="")
  # write the result to file
    write.csv(out, file=outfile, row.names=FALSE) 
   }
}

Upvotes: 1

Greg Snow
Greg Snow

Reputation: 49670

You could do something like:

out <- combn( list.files(), 2, FUN=matchpt )
write.table( do.call( rbind, out ), file='output.csv', sep=',' )

This assumes that matchpt is expecting 2 strings with the names of the files and that the result is the same structure each time so that the rbinding makes sense.

You could also write your own function to pass to combn that takes the 2 file names, runs matchpt and then appends the results to the csv file. Remember that if you pass an open filehandle to write.table then it will append to the file instead of overwriting what is there.

Upvotes: 2

Related Questions