Rad
Rad

Reputation: 1029

How do I plot boxplots of two different series?

I have 2 dataframe sharing the same rows IDs but with different columns

Here is an example

  chrom     coord               sID      CM0016      CM0017    CM0018
7     10   3178881 SP_SA036,SP_SA040 0.000000000 0.000000000 0.0009923
8     10  38894616 SP_SA036,SP_SA040 0.000434783 0.000467464 0.0000970
9     11 104972190 SP_SA036,SP_SA040 0.497802888 0.529319536 0.5479003

and

   chrom     coord            sID      CM0001      CM0002      CM0003
4     10   3178881 SP_SA036,SA040 0.526806527 0.544927536 0.565610860
5     10  38894616 SP_SA036,SA040 0.009049774 0.002849003 0.002857143
6     11 104972190 SP_SA036,SA040 0.451612903 0.401617251 0.435318275

I am trying to create a composite boxplot figure where I have in x axis the chrom and coord combined (so 3 points) and for each x value 2 boxplots side by side corresponding to the two dataframes ?

What is the best way of doing this ? Should I merge the two dataframes together somehow in order to get only one and loop over the boxplots rendering by 3 columns ?

Any idea on how this can be done ?

The problem is that the two dataframes have the same number of rows but can differ in number of columns

>  dim(A)
[1] 99 20
>  dim(B)
[1] 99 28

I was thinking about transposing the dataframe in order to get the same number of column but got lost on how to this properly Thanks in advance

UPDATE

This is what I tried to do

I think it solved my problem but the boxplot looks very busy with 99 x values with 2 boxplots each

Upvotes: 0

Views: 1908

Answers (1)

MrFlick
MrFlick

Reputation: 206232

So if these are your input tables

d1<-structure(list(chrom = c(10L, 10L, 11L), 
coord = c(3178881L, 38894616L, 104972190L), 
sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SP_SA040", class = "factor"), 
    CM0016 = c(0, 0.000434783, 0.497802888), CM0017 = c(0, 0.000467464, 
    0.529319536), CM0018 = c(0.0009923, 9.7e-05, 0.5479003)), .Names = c("chrom", 
"coord", "sID", "CM0016", "CM0017", "CM0018"), class = "data.frame", row.names = c("7", 
"8", "9"))

d2<-structure(list(chrom = c(10L, 10L, 11L), coord = c(3178881L, 
38894616L, 104972190L), sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SA040", class = "factor"), 
    CM0001 = c(0.526806527, 0.009049774, 0.451612903), CM0002 = c(0.544927536, 
    0.002849003, 0.401617251), CM0003 = c(0.56561086, 0.002857143, 
    0.435318275)), .Names = c("chrom", "coord", "sID", "CM0001", 
"CM0002", "CM0003"), class = "data.frame", row.names = c("4", 
"5", "6"))

Then I would combine and reshape the data to make it easier to plot. Here's what i'd do

m1<-melt(d1, id.vars=c("chrom", "coord", "sID"))
m2<-melt(d2, id.vars=c("chrom", "coord", "sID"))
dd<-rbind(cbind(m1, s="T1"), cbind(m2, s="T2"))
mm$pos<-factor(paste(mm$chrom,mm$coord,sep=":"),
    levels=do.call(paste, c(unique(dd[order(dd[[1]],dd[[2]]),1:2]), sep=":")))

I first melt the two input tables to turn columns into rows. Then I add a column to each table so I know where the data came from and rbind them together. And finally I do a bit of messy work to make a factor out of the chr/coord pairs sorted in the correct order.

With all that done, I'll make the plot like

ggplot(mm, aes(x=pos, y=value, color=s)) +
    geom_boxplot(position="dodge")

and it looks like

resulting boxplot

Upvotes: 2

Related Questions