Reputation: 3621
I want to check how close is one variable to all of the other variables in a data frame. I want to do this by counting the times they have the same value for the same row (i.e. same observation). For instance, in the mtcars
dataset, the variables gear
and carb
have 7 observations in which they have the same value in the same row (i.e. same car).
I have tried the following, which yields a closeness_matrix
. However, the results seem to be non-sensical. Any idea what is not working?
PS: I also tried to use mapply
, which I guess it would be faster, but it didn’t work so I ended up with the nested loop
.
MWE:
cols_ls <- colnames(mtcars)
closeness_matrix <- matrix(nrow = ncol(mtcars),
ncol = ncol(mtcars))
row.names(closeness_matrix) <- cols_ls; colnames(closeness_matrix) <- cols_ls
for (i in 1:length(cols_ls)){
for (j in i:length(cols_ls)){
closeness_matrix[i,j] <- sum(duplicated(mtcars[,c(i,j), with = FALSE])==TRUE)
}
}
Upvotes: 0
Views: 533
Reputation: 183
I guess the following will do it (but i'm sure there is a smarter way):
closenessFunc<-function(v1,M){
apply(M, 2, function(x,v2) {
sum(x==v)
}, v2=v1)
}
apply(mtcars, MARGIN = 2, closenessFunc, M=mtcars)
output:
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 32 0 0 0 0 0 0 0 0 0 0
cyl 0 32 0 0 0 0 0 0 0 8 2
disp 0 0 32 0 0 0 0 0 0 0 0
hp 0 0 0 32 0 0 0 0 0 0 0
drat 0 0 0 0 32 0 0 0 0 1 0
wt 0 0 0 0 0 32 0 0 0 0 0
qsec 0 0 0 0 0 0 32 0 0 0 0
vs 0 0 0 0 0 0 0 32 19 0 7
am 0 0 0 0 0 0 0 19 32 0 4
gear 0 8 0 0 1 0 0 0 0 32 7
carb 0 2 0 0 0 0 0 7 4 7 32
Upvotes: 2
Reputation: 31
Change
sum(duplicated(mtcars[,c(i,j), with = FALSE])==TRUE)
to
sum(mtcars[,i]==mtcars[,j])
duplicated function does not work the way you are using it.
Upvotes: 0