Reputation: 332
I have a column with different categories. Can we have correlation matrix among Cat1, Cat2 and Cat3 with respect to age
df
ColA Date Age
Cat1 06-05-2021 34
Cat1 07-05-2021 45
Cat1 08-05-2021 34
Cat2 06-05-2021 54
Cat2 07-05-2021 23
Cat2 08-05-2021 54
Cat3 06-05-2021 56
Cat3 07-05-2021 34
Cat3 08-05-2021 23
Upvotes: 1
Views: 41
Reputation: 886938
Using dcast
from data.table
library(data.table)
cor(dcast(setDT(df), rowid(ColA) ~ ColA, value.var = 'Age')[, ColA := NULL])
# Cat1 Cat2 Cat3
#Cat1 1.0000000 -1.0000000 -0.1889822
#Cat2 -1.0000000 1.0000000 0.1889822
#Cat3 -0.1889822 0.1889822 1.0000000
Upvotes: 1
Reputation: 39647
You can use cor
after unstack
the df
:
cor(unstack(df[c(3,1)]))
# Cat1 Cat2 Cat3
#Cat1 1.0000000 -1.0000000 -0.1889822
#Cat2 -1.0000000 1.0000000 0.1889822
#Cat3 -0.1889822 0.1889822 1.0000000
Upvotes: 2
Reputation: 101034
Do you mean something like this?
> xtabs(Age ~ ., df)
Date
ColA 06-05-2021 07-05-2021 08-05-2021
Cat1 34 45 34
Cat2 54 23 54
Cat3 56 34 23
or
> cor(t(xtabs(Age ~ ., df)))
Cat1 Cat2 Cat3
Cat1 1.0000000 -1.0000000 -0.1889822
Cat2 -1.0000000 1.0000000 0.1889822
Cat3 -0.1889822 0.1889822 1.0000000
Upvotes: 1