Reputation: 13
I'm using histogram
from the lattice package to plot two histograms conditioning on a variable with two options: Male or Female.
histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) &
(raw$Housework_Tot_Min < 1000)] | raw$Gender)
Output of code: two histograms, minutes doing housework by gender
But, when I actually look at the data, these histograms are not correct. By plotting:
histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) &
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Female")]
and:
histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) &
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Male")]
I get two histograms again, but they look very different
Does anyone have insight on why these outputs don't match? I have a bunch more binary-type panels to plot, and having to do them separately really defeats the purpose of working with the lattice package!
I apologize if this belies a fundamental misunderstanding of an easy concept, I'm still very much a beginner at R! Many thanks for the help.
Upvotes: 1
Views: 785
Reputation: 13
Turns out that the issue was around a mismatch of data based on the exclusions applied using the brackets. Instead of:
histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) &
(raw$Housework_Tot_Min < 1000)] | raw$Gender)
It should read:
histogram(~ Housework_Tot_Min [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)] |
Gender [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)], data = raw,
main = "Time Observed Housework by Gender",
xlab = "Minutes spent",
breaks = seq(from = 0, to = 400, by = 20))
Note that the exclusions are now applied to both the housework time and gender variables, eliminating the mismatches in the data.
The correct plot has been pasted below. Thanks again to all for the guidance.
Upvotes: 0
Reputation: 5308
The problem is related with differing values in panel.args.common
(i.e., the arguments common to all the panel functions, see ?trellis.object
). Here is some sample code to clarify my point.
library(lattice)
## paneled plot
hist1 <- histogram( ~ Sepal.Width | Species, data = iris)
hist1$panel.args.common
# $breaks
# [1] 1.904 2.228 2.552 2.876 3.200 3.524 3.848 4.172 4.496
#
# $type
# [1] "percent"
#
# $equal.widths
# [1] TRUE
#
# $nint
# [1] 8
## single plot
hist2 <- histogram( ~ Sepal.Width, data = iris[iris$Species == "setosa", ])
hist2$panel.args.common
# $breaks
# [1] 2.216 2.540 2.864 3.188 3.512 3.836 4.160 4.484
#
# $type
# [1] "percent"
#
# $equal.widths
# [1] TRUE
#
# $nint
# [1] 7
nint
(number of histogram bins, see ?histogram
) and breaks
(breakpoints of the bins) are calculated across all target panels, and therefore vary between hist1
and hist2
. If you want these arguments to be identical so that the two plots look similar, you just have to run the following line of code after the two plots have been created.
hist2$panel.args.common <- hist1$panel.args.common
## or vice versa, depending on the number of bins and breakpoints to use
library(gridExtra)
grid.arrange(hist1, hist2, ncol = 2)
Upvotes: 2