Reputation: 446
I have a large dataframe ncol =220 I want to compare the columns to see if they may be identical and produce a matrix for ease of identification.
So what I have is
x y z
1 dog dog cat
2 dog dog dog
3 cat cat cat
What I want
x y z
x - True False
y True - False
z False False -
Is there a way to do this using identical() in R?
Upvotes: 1
Views: 259
Reputation: 24074
Probably not very efficient but you can try:
seq_col <- seq_len(ncol(df))
sapply(seq_col, function(i) sapply(seq_col, function(j) identical(df[, i], df[, j])))
# [,1] [,2] [,3]
# [1,] TRUE TRUE FALSE
# [2,] TRUE TRUE FALSE
# [3,] FALSE FALSE TRUE
It gives you what you want (except for the diagonal, which is all TRUE here) but there must be a package with a function to create a distance matrix based on character vectors. Maybe something with stringdist
?
Upvotes: 3
Reputation: 51592
To compliment @Cath's comment about stringdist
, it is as easy as,
library(stringdist)
stringdistmatrix(df, df) == 0
# [,1] [,2] [,3]
#[1,] TRUE TRUE FALSE
#[2,] TRUE TRUE FALSE
#[3,] FALSE FALSE TRUE
Upvotes: 4