MagicLettuce
MagicLettuce

Reputation: 93

R ggplot2: How to make a histogram and color according to diferent columns?

I am trying to generate a with some data, but I can't find a way to make work to achieve what I want.

For context, my data looks like this: (column names)

​

|  Name  |  Total Enrichment % (A+B+C+D)  |  %A  |  %B  |  %C  |  %D  |

I want to generate a histogram showing the distribution of the total Enrichment column and then filling the column with 4 colors showing the different percentages of A, B, C, and D.

I've tried to convert the data into long format, but still, I cannot seem to get exactly what I want.

Any advice would be very helpful! Thank you very much!

Here is an example (it's not the original data, just a small part of it):

    dat <- read.table(text = "Name Total A B C D
1 0.1396104 0.029220779 0.009740260 0.029220779 0.07142857
2 0.1250000 0.010869565 0.021739130 0.016304348 0.07608696
3 0.1337580 0.006369427 0.000000000 0.025477707 0.10191083
4 0.1239669 0.016528926 0.024793388 0.033057851 0.04958678
5 0.1242938 0.011299435 0.016949153 0.039548023 0.05649718
6 0.1311475 0.000000000 0.000000000 0.021857923 0.10928962
7 0.1376147 0.004587156 0.004587156 0.004587156 0.12385321
8 0.1574074 0.046296296 0.018518519 0.032407407 0.06018519
9 0.1269036 0.010152284 0.010152284 0.020304569 0.08629442", sep = "",    header=T)

My goal is to create a histogram with the Total enrichment data, but with each column filled with the other contribution variables (A, B, C and D)

Thanks!

Edit

Thanks to StupidWolf amazing help and comments I could come a little bit closer to what I want.

Here is what I've fot so far (It's not perfect, but so far so good)

enter image description here

What I would like to do is to have the y axis in logarithmic scale, since I have a lot of data in the lower range, and I'm also interesed in the data with a higher enrichment. Also, does anyone know why the bars are not filled? Why are there these white spaces?

Again, thank you very much for your help and patience!

Upvotes: 0

Views: 1212

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

I am making an educated guess on what you want to do, first let's get some data:

set.seed(321)
library(ggplot2)
library(dplyr)
dat = data.frame(Name=1:500,matrix(runif(500*4),ncol=4))
colnames(dat)[-1] = LETTERS[1:4]
dat$Total = rowSums(dat[,-1])

If you want to calculate the contribution of A,B,C and D to each binned value of Total, then we need to do a histogram of Total, it looks like this, and we store the breaks to classify each row:

his_all = hist(dat$Total,br=40)
dat$bin = cut(dat$Total,br=his_all$breaks,labels=his_all$mids)

enter image description here

In the above, I used the middle of the histogram to represent the position to plot the bar again. Hence there's a step to convert the factor label to numeric. Then we need to calculate the contribution of A to D to each total, then pivot longer and plot :

dat %>% 
mutate_at(c("A","B","C","D"),~.x/Total) %>% 
pivot_longer(A:D) %>% 
mutate(bin=as.numeric(as.character(bin))) %>% 
ggplot(aes(x=bin,y=value,fill=name)) + 
geom_col() +
xlab("enrichment")

enter image description here

Another way to visualize your data:

dat$interval = cut_interval(dat$Total,5)

dat %>% mutate_at(c("A","B","C","D"),~.x/Total) %>% 
group_by(interval) %>% select(c(interval,A:D)) %>% 
summarize_all(mean) %>% pivot_longer(-interval) %>%
ggplot(aes(x=interval,y=value,fill=name)) + geom_col()

enter image description here

This shows you for every range of Total, what proportion of A/B/C/D contributes to it..

Upvotes: 2

Related Questions