Reputation: 355
I'm trying to create a scatterplot with marginal histograms as in this question. My data are two (numeric) variables which share seven discrete (somewhat) logarithmically-spaced levels.
I've successfully done this with the help of ggMarginal
in the ggExtra
package, however I'm not happy with the outcome as when plotting the marginal histograms using the same data as for the scatterplots, things don't line up.
As can be seen below, the histogram bars are biased a little to the right or left of the datapoints themselves.
library(ggMarginal)
library(ggplot2)
x <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,7,12,18,12,7,3))
y <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,1,13,28,13,1,3))
d <- data.frame("x" = x,"y" = y)
p1 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(), type = "histogram")
A possible solution for this may be change the variables used in the histograms into factors, so they are nicely aligned with the scatterplot axes.
This works well when creating histograms using ggplot
:
p2 <- ggplot(data.frame(lapply(d, as.factor)), aes(x = x)) + geom_histogram()
However, when I try to do this using ggMarginal
, I do not get the desired result - it appears that the ggMarginal
histogram is still treating my variables as numeric.
p3 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(),
x = as.factor(x), y = as.factor(y), type = "histogram")
How can I ensure my histogram bars are centred over the data points?
I'm absolutely willing to accept an answer which does not involve use of ggMarginal
.
Upvotes: 3
Views: 2503
Reputation: 365
Not sure if it is a good idea to replicate here the answer I gave to the question you mentioned but I have no rights to comment still, please let me know otherwise.
I've found the package (ggpubr
) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools
)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra
: "One limitation of ggExtra
is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot
package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Upvotes: 5
Reputation: 9656
If you are willing to give baseplotting a try, here is a function:
plots$scatterWithHists <- function(x, y, histCols=c("lightblue","lightblue"), lhist=20, xlim=range(x), ylim=range(y), ...){
## set up layout and graphical parameters
layMat <- matrix(c(1,4,3,2), ncol=2)
layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
ospc <- 0.5 # outer space
pext <- 4 # par extension down and to the left
bspc <- 1 # space between scatter plot and bar plots
par. <- par(mar=c(pext, pext, bspc, bspc), oma=rep(ospc, 4)) # plot parameters
## barplot and line for x (top)
xhist <- hist(x, breaks=seq(xlim[1], xlim[2], length.out=lhist), plot=FALSE)
par(mar=c(0, pext, 0, 0))
barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density)), space=0, col=histCols[1])
## barplot and line for y (right)
yhist <- hist(y, breaks=seq(ylim[1], ylim[2], length.out=lhist), plot=FALSE)
par(mar=c(pext, 0, 0, 0))
barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density)), space=0, col=histCols[2], horiz=TRUE)
## overlap
dx <- density(x)
dy <- density(y)
par(mar=c(0, 0, 0, 0))
plot(dx, col=histCols[1], xlim=range(c(dx$x, dy$x)), ylim=range(c(dx$y, dy$y)),
lwd=4, type="l", main="", xlab="", ylab="", yaxt="n", xaxt="n", bty="n"
)
points(dy, col=histCols[2], type="l", lwd=3)
## scatter plot
par(mar=c(pext, pext, 0, 0))
plot(x, y, xlim=xlim, ylim=ylim, ...)
}
Just do:
scatterWithHists(x,y, histCols=c("lightblue","orange"))
And you get:
If you absolutely want to use ggMargins
then look up xparams
and yparams
. It says you can send additional arguments to x-margin and y-margin using those. I was only successful in sending trivial things like color. But maybe sending something like xlim
would help.
Upvotes: 2