Niels Hagenbuch
Niels Hagenbuch

Reputation: 143

Rotate histogram in R or overlay a density in a barplot

I would like to rotate a histogram in R, plotted by hist(). The question is not new, and in several forums I have found that it is not possible. However, all these answers date back to 2010 or even later.

Has anyone found a solution meanwhile?

One way to get around the problem is to plot the histogram via barplot() that offers the option "horiz=TRUE". The plot works fine but I fail to overlay a density in the barplots. The problem probably lies in the x-axis since in the vertical plot, the density is centered in the first bin, while in the horizontal plot the density curve is messed up.

Any help is very much appreciated!

Thanks,

Niels

Code:

require(MASS)
Sigma <- matrix(c(2.25, 0.8, 0.8, 1), 2, 2)
mvnorm <- mvrnorm(1000, c(0,0), Sigma)

scatterHist.Norm <- function(x,y) {
 zones <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
 layout(zones, widths=c(2/3,1/3), heights=c(1/3,2/3))
 xrange <- range(x) ; yrange <- range(y)
 par(mar=c(3,3,1,1))
 plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="", cex=0.5)
 xhist <- hist(x, plot=FALSE, breaks=seq(from=min(x), to=max(x), length.out=20))
 yhist <- hist(y, plot=FALSE, breaks=seq(from=min(y), to=max(y), length.out=20))
 top <- max(c(xhist$counts, yhist$counts))
 par(mar=c(0,3,1,1))
 plot(xhist, axes=FALSE, ylim=c(0,top), main="", col="grey")
 x.xfit <- seq(min(x),max(x),length.out=40)
 x.yfit <- dnorm(x.xfit,mean=mean(x),sd=sd(x))
 x.yfit <- x.yfit*diff(xhist$mids[1:2])*length(x)
 lines(x.xfit, x.yfit, col="red")
 par(mar=c(0,3,1,1))
 plot(yhist, axes=FALSE, ylim=c(0,top), main="", col="grey", horiz=TRUE)
 y.xfit <- seq(min(x),max(x),length.out=40)
 y.yfit <- dnorm(y.xfit,mean=mean(x),sd=sd(x))
 y.yfit <- y.yfit*diff(yhist$mids[1:2])*length(x)
 lines(y.xfit, y.yfit, col="red")
}
scatterHist.Norm(mvnorm[,1], mvnorm[,2])


scatterBar.Norm <- function(x,y) {
 zones <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
 layout(zones, widths=c(2/3,1/3), heights=c(1/3,2/3))
 xrange <- range(x) ; yrange <- range(y)
 par(mar=c(3,3,1,1))
 plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="", cex=0.5)
 xhist <- hist(x, plot=FALSE, breaks=seq(from=min(x), to=max(x), length.out=20))
 yhist <- hist(y, plot=FALSE, breaks=seq(from=min(y), to=max(y), length.out=20))
 top <- max(c(xhist$counts, yhist$counts))
 par(mar=c(0,3,1,1))
 barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
 x.xfit <- seq(min(x),max(x),length.out=40)
 x.yfit <- dnorm(x.xfit,mean=mean(x),sd=sd(x))
 x.yfit <- x.yfit*diff(xhist$mids[1:2])*length(x)
 lines(x.xfit, x.yfit, col="red")
 par(mar=c(3,0,1,1))
 barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
 y.xfit <- seq(min(x),max(x),length.out=40)
 y.yfit <- dnorm(y.xfit,mean=mean(x),sd=sd(x))
 y.yfit <- y.yfit*diff(yhist$mids[1:2])*length(x)
 lines(y.xfit, y.yfit, col="red")
}
scatterBar.Norm(mvnorm[,1], mvnorm[,2])
#

Source of scatter plot with marginal histograms (click first link after "adapted from..."):

http://r.789695.n4.nabble.com/newbie-scatterplot-with-marginal-histograms-done-and-axes-labels-td872589.html

Source of density in a scatter plot:

http://www.statmethods.net/graphs/density.html

Upvotes: 12

Views: 17231

Answers (5)

Berry Boessenkool
Berry Boessenkool

Reputation: 1538

I'm not sure whether it is of interest, but I sometimes want to use horizontal histograms without any packages and be able to write or draw at any position of the graphic.

That's why I wrote the following function, with examples provided below. If anyone knows a package to which this would fit well, please write me: berry-b at gmx.de

Please be sure not to have a variable hpos in your workspace, as it will be overwritten with a function. (Yes, for a package I would need to insert some safety parts in the function).

horiz.hist <- function(Data, breaks="Sturges", col="transparent", las=1, 
ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat, border=par("fg"), ... )
  {a <- hist(Data, plot=FALSE, breaks=breaks)
  HBreaks <- a$breaks
  HBreak1 <- a$breaks[1]
  hpos <<- function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/ diff(range(HBreaks))
  barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col, border=border,...)      
  axis(2, at=hpos(labelat), labels=labels, las=las, ...) 
  print("use hpos() to address y-coordinates") }

For examples

# Data and basic concept
set.seed(8); ExampleData <- rnorm(50,8,5)+5
hist(ExampleData)
horiz.hist(ExampleData, xlab="absolute frequency") 
# Caution: the labels at the y-axis are not the real coordinates!
# abline(h=2) will draw above the second bar, not at the label value 2. Use hpos:
abline(h=hpos(11), col=2)

# Further arguments
horiz.hist(ExampleData, xlim=c(-8,20)) 
horiz.hist(ExampleData, main="the ... argument worked!", col.axis=3) 
hist(ExampleData, xlim=c(-10,40)) # with xlim
horiz.hist(ExampleData, ylim=c(-10,40), border="red") # with ylim
horiz.hist(ExampleData, breaks=20, col="orange")
axis(2, hpos(0:10), labels=F, col=2) # another use of hpos()

One shortcoming: the function doesn't work with breakpoints provided as a vector with different widths of the bars.

Upvotes: 3

mathlete
mathlete

Reputation: 6692

scatterBarNorm <- function(x, dcol="blue", lhist=20, num.dnorm=5*lhist, ...){
    ## check input
    stopifnot(ncol(x)==2)
    ## set up layout and graphical parameters
    layMat <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
    layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
    ospc <- 0.5 # outer space
    pext <- 4 # par extension down and to the left
    bspc <- 1 # space between scatter plot and bar plots
    par. <- par(mar=c(pext, pext, bspc, bspc),
                oma=rep(ospc, 4)) # plot parameters
    ## scatter plot
    plot(x, xlim=range(x[,1]), ylim=range(x[,2]), ...)
    ## 3) determine barplot and height parameter
    ## histogram (for barplot-ting the density)
    xhist <- hist(x[,1], plot=FALSE, breaks=seq(from=min(x[,1]), to=max(x[,1]),
                                     length.out=lhist))
    yhist <- hist(x[,2], plot=FALSE, breaks=seq(from=min(x[,2]), to=max(x[,2]),
                                     length.out=lhist)) # note: this uses probability=TRUE
    ## determine the plot range and all the things needed for the barplots and lines
    xx <- seq(min(x[,1]), max(x[,1]), length.out=num.dnorm) # evaluation points for the overlaid density
    xy <- dnorm(xx, mean=mean(x[,1]), sd=sd(x[,1])) # density points
    yx <- seq(min(x[,2]), max(x[,2]), length.out=num.dnorm)
    yy <- dnorm(yx, mean=mean(x[,2]), sd=sd(x[,2]))
    ## barplot and line for x (top)
    par(mar=c(0, pext, 0, 0))
    barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density, xy)),
            space=0) # barplot
    lines(seq(from=0, to=lhist-1, length.out=num.dnorm), xy, col=dcol) # line
    ## barplot and line for y (right)
    par(mar=c(pext, 0, 0, 0))
    barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density, yy)),
            space=0, horiz=TRUE) # barplot
    lines(yy, seq(from=0, to=lhist-1, length.out=num.dnorm), col=dcol) # line
    ## restore parameters
    par(par.)
}

require(mvtnorm)
X <- rmvnorm(1000, c(0,0), matrix(c(1, 0.8, 0.8, 1), 2, 2))
scatterBarNorm(X, xlab=expression(italic(X[1])), ylab=expression(italic(X[2])))

enter image description here

Upvotes: 21

Niels Hagenbuch
Niels Hagenbuch

Reputation: 143

Thank you, Tim and Paul. You made me think harder and use what hist() actually provides.

This is my solution now (with great help from Alex Pl.):

scatterBar.Norm <- function(x,y) {
 zones <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
 layout(zones, widths=c(5/7,2/7), heights=c(2/7,5/7))
 xrange <- range(x)
 yrange <- range(y)
 par(mar=c(3,3,1,1))
 plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="", cex=0.5)
 xhist <- hist(x, plot=FALSE, breaks=seq(from=min(x), to=max(x), length.out=20))
 yhist <- hist(y, plot=FALSE, breaks=seq(from=min(y), to=max(y), length.out=20))
 top <- max(c(xhist$density, yhist$density))
 par(mar=c(0,3,1,1))
 barplot(xhist$density, axes=FALSE, ylim=c(0, top), space=0)
 x.xfit <- seq(min(x),max(x),length.out=40)
 x.yfit <- dnorm(x.xfit, mean=mean(x), sd=sd(x))
 x.xscalefactor <- x.xfit / seq(from=0, to=19, length.out=40)
 lines(x.xfit/x.xscalefactor, x.yfit, col="red")
 par(mar=c(3,0,1,1))
 barplot(yhist$density, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
 y.xfit <- seq(min(y),max(y),length.out=40)
 y.yfit <- dnorm(y.xfit, mean=mean(y), sd=sd(y))
 y.xscalefactor <- y.xfit / seq(from=0, to=19, length.out=40)
 lines(y.yfit, y.xfit/y.xscalefactor, col="red")
}

For examples:

require(MASS)
#Sigma <- matrix(c(2.25, 0.8, 0.8, 1), 2, 2)
Sigma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)
mvnorm <- mvrnorm(1000, c(0,0), Sigma) ; scatterBar.Norm(mvnorm[,1], mvnorm[,2])

An asymmetric Sigma leads to a somewhat bulkier histogram of the respective axis.

The code is left deliberately "unelegant" in order to increase comprehensibility (for myself when I revisit it later...).

Niels

Upvotes: 2

tim riffe
tim riffe

Reputation: 5691

It may be helpful to know that the hist() function invisibly returns all the information that you need to reproduce what it does using simpler plotting functions, like rect().

    vals <- rnorm(10)
    A <- hist(vals)
    A
    $breaks
    [1] -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5

    $counts
    [1] 1 3 3 1 1 1

    $intensities
    [1] 0.2 0.6 0.6 0.2 0.2 0.2

    $density
    [1] 0.2 0.6 0.6 0.2 0.2 0.2

    $mids
    [1] -1.25 -0.75 -0.25  0.25  0.75  1.25

    $xname
    [1] "vals"

    $equidist
    [1] TRUE

    attr(,"class")
    [1] "histogram"

You can create the same histogram manually like this:

    plot(NULL, type = "n", ylim = c(0,max(A$counts)), xlim = c(range(A$breaks)))
    rect(A$breaks[1:(length(A$breaks) - 1)], 0, A$breaks[2:length(A$breaks)], A$counts)

With those parts, you can flip the axes however you like:

    plot(NULL, type = "n", xlim = c(0, max(A$counts)), ylim = c(range(A$breaks)))
    rect(0, A$breaks[1:(length(A$breaks) - 1)], A$counts, A$breaks[2:length(A$breaks)])

For similar do-it-yourselfing with density(), see: Axis-labeling in R histogram and density plots; multiple overlays of density plots

Upvotes: 5

Paul Hiemstra
Paul Hiemstra

Reputation: 60964

When using ggplot, flipping axes works very well. See for example this example which shows how to do this for a boxplot, but it works equally well for a histogram I assume. In ggplot one can quite easily overlay different plot types, or geometries in ggplot2 jargon. So combining a density plot and a histogram should be easy.

Upvotes: 0

Related Questions