Reputation: 334
I am trying to plot six histograms (2 colums of data (calories, sodium) x 3 types (beef, meat, poultry)) with these data and I want to give them the same scale for x and y axis. I'm using scale_x_continuous
to limit the x axis, which according to various sources, removes data that won't appear on the plot. Here is my code:
#src.table is the data frame containing my data
histogram <- function(df, dataset, n_bins, label) {
ggplot(df, aes(x=df[[dataset]])) +
geom_histogram(color="darkblue", fill="lightblue", bins = n_bins) + xlab(label)
}
src2_12.beef <- src2_12.table[src2_12.table$Type == "Beef",]
src2_12.meat <- src2_12.table[src2_12.table$Type == "Meat",]
src2_12.poultry <- src2_12.table[src2_12.table$Type == "Poultry",]
src2_12.calories_scale <- lims(x = c(min(src2_12.table$Calories), max(src2_12.table$Calories)), y = c(0, 6))
src2_12.sodium_scale <- lims(x = c(min(src2_12.table$Sodium), max(src2_12.table$Sodium)), y = c(0, 6))
#src2_12.calories_scale <- lims()
#src2_12.sodium_scale <- lims()
src2_12.plots <- list(
histogram(src2_12.beef, "Calories", 10, "Calories-Beef") + src2_12.calories_scale,
histogram(src2_12.meat, "Calories", 10, "Calories-Meat") + src2_12.calories_scale,
histogram(src2_12.poultry, "Calories", 10, "Calories-Poultry") + src2_12.calories_scale,
histogram(src2_12.beef, "Sodium", 10, "Sodium-Beef") + src2_12.sodium_scale,
histogram(src2_12.meat, "Sodium", 10, "Sodium-Meat") + src2_12.sodium_scale,
histogram(src2_12.poultry, "Sodium", 10, "Sodium-Poultry") + src2_12.sodium_scale
)
multiplot(plotlist = src2_12.plots, cols = 2, layout = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = TRUE))
vs. what the data are supposed to look like:
I couldn't understand why some data points are missing since given that the limit I set is already the min and the max of the data.
Upvotes: 2
Views: 2694
Reputation: 173737
You probably want to use coord_cartesian
instead of lims
. Unexpected things can happen when you're fiddling around with the limits on histograms, because a fair bit of fiddly transformations have to happen to get from your raw data to the actual histogram.
Let's peer under the hood for one example:
p <- ggplot(src2_12.beef,aes(x = Calories)) +
geom_histogram(bins = 10)
p1 <- ggplot(src2_12.beef,aes(x = Calories)) +
geom_histogram(bins = 10) +
lims(x = c(86,195))
a <- ggplot_build(p)
b <- ggplot_build(p1)
>a$data[[1]][,1:5]
y count x xmin xmax
1 1 1 114.1111 109.7222 118.5000
2 0 0 122.8889 118.5000 127.2778
3 3 3 131.6667 127.2778 136.0556
4 2 2 140.4444 136.0556 144.8333
5 5 5 149.2222 144.8333 153.6111
6 2 2 158.0000 153.6111 162.3889
7 0 0 166.7778 162.3889 171.1667
8 2 2 175.5556 171.1667 179.9444
9 3 3 184.3333 179.9444 188.7222
10 2 2 193.1111 188.7222 197.5000
> b$data[[1]][,1:5]
y count x xmin xmax
1 0 0 NA NA 90.83333
2 0 0 96.88889 90.83333 102.94444
3 1 1 109.00000 102.94444 115.05556
4 0 0 121.11111 115.05556 127.16667
5 4 4 133.22222 127.16667 139.27778
6 4 4 145.33333 139.27778 151.38889
7 4 4 157.44444 151.38889 163.50000
8 1 1 169.55556 163.50000 175.61111
9 4 4 181.66667 175.61111 187.72222
10 2 2 193.77778 187.72222 NA
>
So now you're wondering, how the heck did that happen, right?
Well, when you tell ggplot that you want 10 bins and the x limits go from 86 to 195, the histogram algorithm tries to create ten bins that span that actual range. That's why it's trying to create bins down below 100 even though there's no data there.
And then further oddities can happen because the bars may extend past the nominal data range (the xmin
and xmax
values), since the bar widths will generally encompass a little above and a little below your actual data at the high and low ends.
coord_cartesian
will adjust the x limits after all this processing has happened, so it bypasses all these little quirks.
Upvotes: 5