mynameisJEFF
mynameisJEFF

Reputation: 4239

Multivariate apply function to compare, pair-wise, a set of files

I have a vector, which contains names of data vectors, named c("tom.txt", "tim.txt" , "Amy.txt"). My task is to: build a symmetric matrix that looks like the following at the end.

> m
        tom.txt tim.txt amy.txt
tom.txt       0      10       5
tim.txt      10       0       7
amy.txt       5       7       0

The entries are obtained by a function called get.result(vec1, vec2), which finds the corresponding data vectors of the 2 text files in the directory and does some operations to them and then returns a value for each position in the matrix. For instance, if I want to get the entry m["tom.txt", "tim.txt"], I need to pass "tom.txt" and "tim.txt" into get.result("tom.txt", "tim.txt"). The content of this function is not important.

However, if I want to compute the value for all entries, it will be tedious to keep typing get.result("tom.txt", "amy.txt"), get.result("tim.txt", "amy.txt"), especially when I am actually working with 100 different text files.

My question: Is there an efficient way to program this such that I am always comparing one text file against the rest (never compare against itself) and at the same time, I can keep track of their positions in the matrix ?

Should I initialise the matrix with all zeros right at the beginning and set the column and row names to be the text files name ? But in this case, I am not sure how to obtain the column names and row names such that I can pass them into get.result(vec1, vec2).

Upvotes: 0

Views: 257

Answers (4)

Ferdinand.kraft
Ferdinand.kraft

Reputation: 12829

Given that your file names are in a vector, say

vec <- c("tom.txt", "tim.txt" , "Amy.txt")

you can use

temp <- outer(seq(vec), seq(vec), Vectorize(function(x,y) if(x>y) get.result(vec[x],vec[y]) else 0 ))
result <- temp + t(temp)

Note that this makes sure get.result() is called only once for every relevant comparison, i.e., it's not called for equal files, nor is it called for pairs that differ only by order.

The last line creates a symmetric matrix.

EDIT: to get the names, use this:

rownames(result) <- colnames(result) <- vec

Upvotes: 0

IRTFM
IRTFM

Reputation: 263481

The combn function gives you distinct combinations of vector elements:

combs <-combn( c("tom.txt", "tim.txt" , "Amy.txt") , 2)
#----------------
     [,1]      [,2]      [,3]     
[1,] "tom.txt" "tom.txt" "tim.txt"
[2,] "tim.txt" "Amy.txt" "Amy.txt"

You can then : apply( combs, 2, get.result)

Upvotes: 0

Sutaren
Sutaren

Reputation: 116

fn = dir(pattern=".txt") (change the pattern if needed) will give you the text files in your target folder. You could then loop over that list like in the previous answer.

Upvotes: 1

djhurio
djhurio

Reputation: 5536

Try this solution

fn <- c("tom.txt", "tim.txt" , "Amy.txt")

n <- length(fn)

m <- matrix(0, n, n)

rownames(m) <- fn
colnames(m) <- fn

for (i in 1:n) for (j in i:n) if (i!=j) {
  v <- get.result(fn[i], fn[j])
  m[i,j] <- v
  m[j,i] <- v
}

m

Upvotes: 3

Related Questions