User981636
User981636

Reputation: 3621

Comparison of variable to all other variables in data frame

I want to check how close is one variable to all of the other variables in a data frame. I want to do this by counting the times they have the same value for the same row (i.e. same observation). For instance, in the mtcars dataset, the variables gear and carb have 7 observations in which they have the same value in the same row (i.e. same car).

I have tried the following, which yields a closeness_matrix. However, the results seem to be non-sensical. Any idea what is not working?

PS: I also tried to use mapply, which I guess it would be faster, but it didn’t work so I ended up with the nested loop.

MWE:

cols_ls <- colnames(mtcars)

closeness_matrix <- matrix(nrow = ncol(mtcars),
                            ncol = ncol(mtcars))

row.names(closeness_matrix) <- cols_ls; colnames(closeness_matrix) <- cols_ls


for (i in 1:length(cols_ls)){

  for (j in i:length(cols_ls)){

    closeness_matrix[i,j] <- sum(duplicated(mtcars[,c(i,j), with = FALSE])==TRUE)

  }
}

Upvotes: 0

Views: 533

Answers (2)

AStieb
AStieb

Reputation: 183

I guess the following will do it (but i'm sure there is a smarter way):

closenessFunc<-function(v1,M){
      apply(M, 2, function(x,v2) {
        sum(x==v)
      }, v2=v1)
    }
apply(mtcars, MARGIN = 2, closenessFunc, M=mtcars)

output:

     mpg cyl disp hp drat wt qsec vs am gear carb
mpg   32   0    0  0    0  0    0  0  0    0    0
cyl    0  32    0  0    0  0    0  0  0    8    2
disp   0   0   32  0    0  0    0  0  0    0    0
hp     0   0    0 32    0  0    0  0  0    0    0
drat   0   0    0  0   32  0    0  0  0    1    0
wt     0   0    0  0    0 32    0  0  0    0    0
qsec   0   0    0  0    0  0   32  0  0    0    0
vs     0   0    0  0    0  0    0 32 19    0    7
am     0   0    0  0    0  0    0 19 32    0    4
gear   0   8    0  0    1  0    0  0  0   32    7
carb   0   2    0  0    0  0    0  7  4    7   32

Upvotes: 2

Saurabh
Saurabh

Reputation: 31

Change

sum(duplicated(mtcars[,c(i,j), with = FALSE])==TRUE)

to

sum(mtcars[,i]==mtcars[,j])

duplicated function does not work the way you are using it.

Upvotes: 0

Related Questions