AudileF
AudileF

Reputation: 446

How to check if columns in dataframe are identical in R [produce matrix]

I have a large dataframe ncol =220 I want to compare the columns to see if they may be identical and produce a matrix for ease of identification.

So what I have is

      x    y   z
1   dog   dog   cat    
2   dog   dog   dog
3   cat   cat   cat

What I want

     x     y     z
x   -     True   False
y   True     -   False
z   False False   -

Is there a way to do this using identical() in R?

Upvotes: 1

Views: 259

Answers (2)

Cath
Cath

Reputation: 24074

Probably not very efficient but you can try:

seq_col <- seq_len(ncol(df))
sapply(seq_col, function(i) sapply(seq_col, function(j) identical(df[, i], df[, j])))
      # [,1]  [,2]  [,3]
# [1,]  TRUE  TRUE FALSE
# [2,]  TRUE  TRUE FALSE
# [3,] FALSE FALSE  TRUE

It gives you what you want (except for the diagonal, which is all TRUE here) but there must be a package with a function to create a distance matrix based on character vectors. Maybe something with stringdist ?

Upvotes: 3

Sotos
Sotos

Reputation: 51592

To compliment @Cath's comment about stringdist, it is as easy as,

library(stringdist)

stringdistmatrix(df, df) == 0

#      [,1]  [,2]  [,3]
#[1,]  TRUE  TRUE FALSE
#[2,]  TRUE  TRUE FALSE
#[3,] FALSE FALSE  TRUE

Upvotes: 4

Related Questions