Seymoo
Seymoo

Reputation: 189

Density plot for multiple groups in ggplot

I have seen example1 and How to overlay density plots in R? and Overlapped density plots in ggplot2 about how to make density plot. I can make a density plot with the codes in the second link. However I am wondering how can I make such a graph in ggplot or plotly? I have looked at all the examples but cannot figure it out for my problem. I have a toy data frame with gene expression leukemia data description, which columns in it refers to 2 groups of individuals

leukemia_big <- read.csv("http://web.stanford.edu/~hastie/CASI_files/DATA/leukemia_big.csv")

df <- data.frame(class= ifelse(grepl("^ALL", colnames(leukemia_big),
                 fixed = FALSE), "ALL", "AML"), row.names = colnames(leukemia_big))

plot(density(as.matrix(leukemia_big[,df$class=="ALL"])), 
     lwd=2, col="red")
lines(density(as.matrix(leukemia_big[,df$class=="AML"])), 
      lwd=2, col="darkgreen")

Upvotes: 3

Views: 2068

Answers (1)

Nicol&#225;s Velasquez
Nicol&#225;s Velasquez

Reputation: 5898

Ggplot requires tidy formated data, also known as a long formatted dataframe. The following example will do it. But be carefull, the provided dataset has an almost identical distribution of values by type of patient, thus when you plot ALL and AML type of patients, the curves overlap and you can not see the difference.

library(tidyverse)

leukemia_big %>% 
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
ggplot(aes(x = value, fill = type)) + geom_density(alpha = 0.5)

results with original data

In this second example I will add 1 unit to the value variable for all AML type of patients, to visually demonstrate the overlapping problem

leukemia_big %>% 
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
mutate(value2 = if_else(condition = type == "ALL", true = value, false = value + 1)) %>% # Helps demonstrate the overlapping between both type of patients
ggplot(aes(x = value2, fill = type)) + geom_density(alpha = 0.5)`

results with modified data for AML type patients

Upvotes: 6

Related Questions