luser
luser

Reputation: 355

Plotting marginal histograms (as factors) and scatterplot (as numeric) from the same variable in R

I'm trying to create a scatterplot with marginal histograms as in this question. My data are two (numeric) variables which share seven discrete (somewhat) logarithmically-spaced levels.

I've successfully done this with the help of ggMarginal in the ggExtra package, however I'm not happy with the outcome as when plotting the marginal histograms using the same data as for the scatterplots, things don't line up. As can be seen below, the histogram bars are biased a little to the right or left of the datapoints themselves.

library(ggMarginal)
library(ggplot2)
x <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,7,12,18,12,7,3))
y <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,1,13,28,13,1,3))
d <- data.frame("x" = x,"y" = y)
p1 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(), type = "histogram")

p1

A possible solution for this may be change the variables used in the histograms into factors, so they are nicely aligned with the scatterplot axes. This works well when creating histograms using ggplot:

p2 <- ggplot(data.frame(lapply(d, as.factor)), aes(x = x)) + geom_histogram()

p2

However, when I try to do this using ggMarginal, I do not get the desired result - it appears that the ggMarginal histogram is still treating my variables as numeric.

p3 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(),
                 x = as.factor(x), y = as.factor(y), type = "histogram")

p3

How can I ensure my histogram bars are centred over the data points?

I'm absolutely willing to accept an answer which does not involve use of ggMarginal.

Upvotes: 3

Views: 2503

Answers (2)

Alf Pascu
Alf Pascu

Reputation: 365

Not sure if it is a good idea to replicate here the answer I gave to the question you mentioned but I have no rights to comment still, please let me know otherwise.

I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.

The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.

I first installed the package (it requires devtools)

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:

install.packages("cowplot")

And I followed this piece of code:

# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
            color = "Species", palette = "jco",
            size = 3, alpha = 0.6)+
border()                                         
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
               palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species", 
               palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend") 
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv", 
      rel_widths = c(2, 1), rel_heights = c(1, 2))

Which worked fine for me:

Iris set marginal histograms scatterplot

Upvotes: 5

Karolis Koncevičius
Karolis Koncevičius

Reputation: 9656

If you are willing to give baseplotting a try, here is a function:

plots$scatterWithHists <- function(x, y, histCols=c("lightblue","lightblue"), lhist=20, xlim=range(x), ylim=range(y), ...){
  ## set up layout and graphical parameters
  layMat <- matrix(c(1,4,3,2), ncol=2)
  layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
  ospc <- 0.5                                                  # outer space
  pext <- 4                                                    # par extension down and to the left
  bspc <- 1                                                    # space between scatter plot and bar plots
  par. <- par(mar=c(pext, pext, bspc, bspc), oma=rep(ospc, 4)) # plot parameters

  ## barplot and line for x (top)
  xhist <- hist(x, breaks=seq(xlim[1], xlim[2], length.out=lhist), plot=FALSE)
  par(mar=c(0, pext, 0, 0))
  barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density)), space=0, col=histCols[1])

  ## barplot and line for y (right)
  yhist <- hist(y, breaks=seq(ylim[1], ylim[2], length.out=lhist), plot=FALSE)
  par(mar=c(pext, 0, 0, 0))
  barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density)), space=0, col=histCols[2], horiz=TRUE)

  ## overlap
  dx <- density(x)
  dy <- density(y)
  par(mar=c(0, 0, 0, 0))
  plot(dx, col=histCols[1], xlim=range(c(dx$x, dy$x)), ylim=range(c(dx$y, dy$y)),
       lwd=4, type="l", main="", xlab="", ylab="", yaxt="n", xaxt="n", bty="n"
       )
  points(dy, col=histCols[2], type="l", lwd=3)

  ## scatter plot
  par(mar=c(pext, pext, 0, 0))
  plot(x, y, xlim=xlim, ylim=ylim, ...)
}

Just do:

scatterWithHists(x,y, histCols=c("lightblue","orange"))

And you get:

marginalHists

If you absolutely want to use ggMargins then look up xparams and yparams. It says you can send additional arguments to x-margin and y-margin using those. I was only successful in sending trivial things like color. But maybe sending something like xlim would help.

Upvotes: 2

Related Questions