Reputation: 385
Hello I would like to create heatmap presenting cofrequency of several variables Let's see some code:
a <- c(1,1,1,1)
b <-c(1,1,1,0)
c<- c(1,1,0,0)
d <- c(1,0,0,0)
df <- cbind(a,b,c,d)
df
a b c d
[1,] 1 1 1 1
[2,] 1 1 1 0
[3,] 1 1 0 0
[4,] 1 0 0 0
'1' represents occurence of a phenomenon '0' the phenonenon did not appear
a and b cofrequency is 75% a and c cofrequency is 50% ...
Finally, I would like to have 4x4 matrix with colnames on x and y axis and in tiles % of cofrequency a vs a = 100%, a vs. b = 75% etc.
May I ask for a little help?
Solutions from comments generate:
library(tidyr)
library(ggplot2)
a <- c(1,1,1,1)
b <-c(1,1,1,0)
c<- c(1,1,0,0)
d <- c(1,0,0,0)
df <- cbind(a,b,c,d)
calc_freq <- function(x, y) {
mean(df[, x] == df[, y] & df[, x] == 1 & df[, y] == 1)
}
mat <- outer(colnames(df), colnames(df), Vectorize(calc_freq))
mat
dimnames(mat) <- list(colnames(df), colnames(df))
mat %>% as_tibble() %>% gather %>% ggplot() + aes(key, value) + geom_tile()
I would rather to have % from mat
as fill and x-axis and y-axis as dinnames(mat)
Upvotes: 1
Views: 401
Reputation: 388807
There should be a function directly doing this however, here is one base R approach using outer
. We write a function which calculates ratio
calc_freq <- function(x, y) {
mean(df[, x] == df[, y] & df[, x] == 1 & df[, y] == 1)
}
and apply it using outer
mat <- outer(colnames(df), colnames(df), Vectorize(calc_freq))
mat
# [,1] [,2] [,3] [,4]
#[1,] 1.00 0.75 0.50 0.25
#[2,] 0.75 0.75 0.50 0.25
#[3,] 0.50 0.50 0.50 0.25
#[4,] 0.25 0.25 0.25 0.25
If you want row and column names we can use dimnames
dimnames(mat) <- list(colnames(df), colnames(df))
This calculates the ratio of occurrence of 1 in two columns at the same position.
To get the plot we can do
library(tidyverse)
data.frame(mat) %>%
rownames_to_column() %>%
gather(key, value, -rowname) %>%
ggplot() + aes(rowname, key, fill = value) +
geom_tile()
Upvotes: 2