imran p
imran p

Reputation: 332

Correlation between different categories

I have a column with different categories. Can we have correlation matrix among Cat1, Cat2 and Cat3 with respect to age

df
ColA    Date        Age
Cat1    06-05-2021  34
Cat1    07-05-2021  45
Cat1    08-05-2021  34
Cat2    06-05-2021  54
Cat2    07-05-2021  23
Cat2    08-05-2021  54
Cat3    06-05-2021  56
Cat3    07-05-2021  34
Cat3    08-05-2021  23  

Upvotes: 1

Views: 41

Answers (3)

akrun
akrun

Reputation: 886938

Using dcast from data.table

library(data.table)
cor(dcast(setDT(df), rowid(ColA) ~ ColA, value.var = 'Age')[, ColA := NULL])
#          Cat1       Cat2       Cat3
#Cat1  1.0000000 -1.0000000 -0.1889822
#Cat2 -1.0000000  1.0000000  0.1889822
#Cat3 -0.1889822  0.1889822  1.0000000

Upvotes: 1

GKi
GKi

Reputation: 39647

You can use cor after unstack the df:

cor(unstack(df[c(3,1)]))
#           Cat1       Cat2       Cat3
#Cat1  1.0000000 -1.0000000 -0.1889822
#Cat2 -1.0000000  1.0000000  0.1889822
#Cat3 -0.1889822  0.1889822  1.0000000

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101034

Do you mean something like this?

> xtabs(Age ~ ., df)
      Date
ColA   06-05-2021 07-05-2021 08-05-2021
  Cat1         34         45         34
  Cat2         54         23         54
  Cat3         56         34         23

or

> cor(t(xtabs(Age ~ ., df)))
           Cat1       Cat2       Cat3
Cat1  1.0000000 -1.0000000 -0.1889822
Cat2 -1.0000000  1.0000000  0.1889822
Cat3 -0.1889822  0.1889822  1.0000000

Upvotes: 1

Related Questions