Benjamin Moss
Benjamin Moss

Reputation: 43

ggplot2: Automatic scaling to include complete contour lines in geom_density_2d

Hopefully this will be quick.

I have plotted the following chart using ggplot.

ggplot chart

with the code:

ggplot(ContourDummy,aes(x=Measure.Name1,y=Measure.Name2,colour=Category.Name))
+geom_density_2d()

My issue is that some of the contour lines are not complete.

Now if I scale my axis by adding the following...

+ scale_x_continuous(minor_breaks=0, breaks=seq(14,26,12),limits=c(14,26)) 
+ scale_y_continuous(minor_breaks=0, breaks=seq(50,100,50),limits=c(50,100)

I get the desired output.

But is there any way of automatically setting the limits? I want to be able to replicate this chart type automatically just by switching the data source, x, y and colour.

I don't particularly want to be fiddling around with scales every time.

Upvotes: 3

Views: 2061

Answers (1)

eipi10
eipi10

Reputation: 93851

Here's a function that expands the x and y ranges to include the maximum extent of the density contours. The function works as follows:

  1. Create a plot object with x and y ranges expanded well beyond the data range, so that we can be sure the plot will include complete contour lines.

  2. Use ggplot_build to determine the min and max x and y values among all the density contours.

  3. Set the x and y ranges of the plot to the min and max x and y values determined in step 2.

The exp parameter is there to expand the final range by a tiny amount (1% by default) because a small piece of contour line can still be cut off without that small bit of extra padding (in the example below, try plotting the mtcars data frame with exp=0 and you'll see what I mean).

d2d = function(data, var1, var2, col, exp=0.005) {

  # If the colour variable is numeric, convert to factor
  if(is.numeric(data[,col])) {
    data[,col] = as.factor(data[,col])
  }

  # Create plot, but expand x and y ranges well beyond data
  p=ggplot(data, aes_string(var1, var2, colour=col)) +
    geom_density_2d() +
    scale_x_continuous(limits=c(min(data[,var1]) - 2*diff(range(data[,var1])),
                                max(data[,var1]) + 2*diff(range(data[,var1])))) +
    scale_y_continuous(limits=c(min(data[,var2]) - 2*diff(range(data[,var2])),
                                max(data[,var2]) + 2*diff(range(data[,var2]))))

  # Get min and max x and y values among all density contours
  pb = ggplot_build(p)

  xyscales = lapply(pb$data[[1]][,c("x","y")], function(var) {
    rng = range(var)
    rng + c(-exp*diff(rng), exp*diff(rng))
  })

  # Set x and y ranges to include complete density contours
  ggplot(data, aes_string(var1, var2, colour=col)) +
    geom_density_2d() +
    scale_x_continuous(limits=xyscales[[1]]) +
    scale_y_continuous(limits=xyscales[[2]]) 
}

Try out the function on two built-in data sets:

d2d(mtcars, "wt","mpg", "cyl")
d2d(iris, "Petal.Width", "Petal.Length", "Species")

enter image description here

Here's what the plots look like with the default x and y ranges:

ggplot(mtcars, aes(wt, mpg, colour=factor(cyl))) + geom_density_2d()

ggplot(iris, aes(Petal.Width, Petal.Length, colour=Species)) + geom_density_2d()

enter image description here

If you want to control the number of axis tick marks as well, you can, for example, do something like this:

d2d = function(data, var1, var2, col, nx=5, ny=5, exp=0.01) {

  require(scales)

  # If the colour variable is numeric, convert to factor
  if(is.numeric(data[,col])) {
    data[,col] = as.factor(data[,col])
  }

  # Create plot, but expand x and y ranges well beyond data
  p=ggplot(data, aes_string(var1, var2, colour=col)) +
    geom_density_2d() +
    scale_x_continuous(limits=c(min(data[,var1]) - 2*diff(range(data[,var1])),
                                max(data[,var1]) + 2*diff(range(data[,var1])))) +
    scale_y_continuous(limits=c(min(data[,var2]) - 2*diff(range(data[,var2])),
                                max(data[,var2]) + 2*diff(range(data[,var2]))))

  # Get min and max x and y values among all density curves
  pb = ggplot_build(p)

  xyscales = lapply(pb$data[[1]][,c("x","y")], function(var) {
    rng = range(var)
    rng + c(-exp*diff(rng), exp*diff(rng))
  })

  # Set x and y ranges to include all of outer density curves
  ggplot(data, aes_string(var1, var2, colour=col)) +
    geom_density_2d() +
    scale_x_continuous(limits=xyscales[[1]], breaks=pretty_breaks(n=nx)) +
    scale_y_continuous(limits=xyscales[[2]], breaks=pretty_breaks(n=ny)) 
}

Upvotes: 5

Related Questions