SteveMcManaman
SteveMcManaman

Reputation: 433

plot a heatmap for binary categorical variables in R

I have a dataframe which contains many binary categorical variables, and I would like to display a heatmap-like plot for all the observations, only displaying two colors for "yes" and "no" levels. I would then like to sort it so that those observations (ID) with the most "yes" in their row appear on top.

The sample dataset is provided here:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                   var1 = c('yes', 'yes', 'no', 'yes', 'no'),
                   var2 = c('no', 'yes', 'no', 'yes', 'no'),
                   var3 = c('yes', 'no', 'no', 'yes', 'yes'))
df1


  ID var1 var2 var3
1  1  yes   no  yes
2  2  yes  yes   no
3  3   no   no   no
4  4  yes  yes  yes
5  5   no   no  yes

I tried using the heatmap() function but I could not make it work. Can you please help me with that?

Upvotes: 0

Views: 2834

Answers (2)

denis
denis

Reputation: 5673

If you want to use ggplot, you need to work in long format. I will use tidyverse here:


library(tidyverse)
library(dplyr)

df_long <- df1 %>%
  pivot_longer(cols = paste0("var",1:3))

order <- df_long %>%
  group_by(ID)%>%
  summarise(n = sum(value == "yes"))%>%
  arrange(-n)%>%
  pull(ID)

df_long %>%
  mutate(ID = factor(ID,levels = order))%>%
  ggplot(aes(ID,name,fill = value))+
  geom_tile()

enter image description here

The part with order is to have a vector of your ID ordered by their number of yes. You then need to set the levels of the factor variable following this order, in order to have your heatmap ordered by the number of yes.

Upvotes: 2

Ottie
Ottie

Reputation: 1030

You're on the right track with heatmap. Turn the "yes" / "no" columns of your df into a matrix of 0's and 1's and disable some of the defaults such as scaling and ordering.

mat1 <- 1*(df1[,-1]=="yes")

> mat1
     var1 var2 var3
[1,]    1    0    1
[2,]    1    1    0
[3,]    0    0    0
[4,]    1    1    1
[5,]    0    0    1

# You only need this step if you want the IDs to be shown beside the plot

rownames(mat1) <- rownames(df1)

> mat1
  var1 var2 var3
1    1    0    1
2    1    1    0
3    0    0    0
4    1    1    1
5    0    0    1

# reorder the matrix by rowSums before plotting

heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA)

heatmap outcome

You can change the colour scheme by specifying the col parameter like

heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA, col=c("lightgrey", "tomato"))

If you would prefer the plot to read left-to-right (one column per ID), just transpose the matrix

 heatmap(t(mat1[order(rowSums(mat1)),]), scale = "none", Rowv = NA, Colv = NA)

Upvotes: 2

Related Questions