jwint
jwint

Reputation: 13

Conditional Histograms Using Lattice Package, Output Plots Incorrect

I'm using histogram from the lattice package to plot two histograms conditioning on a variable with two options: Male or Female.

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000)] | raw$Gender)

Output of code: two histograms, minutes doing housework by gender

But, when I actually look at the data, these histograms are not correct. By plotting:

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Female")]

and:

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Male")]

I get two histograms again, but they look very different

Does anyone have insight on why these outputs don't match? I have a bunch more binary-type panels to plot, and having to do them separately really defeats the purpose of working with the lattice package!

I apologize if this belies a fundamental misunderstanding of an easy concept, I'm still very much a beginner at R! Many thanks for the help.

Upvotes: 1

Views: 785

Answers (2)

jwint
jwint

Reputation: 13

Turns out that the issue was around a mismatch of data based on the exclusions applied using the brackets. Instead of:

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000)] | raw$Gender)

It should read:

histogram(~ Housework_Tot_Min [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)] | 
        Gender [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)], data = raw,
      main = "Time Observed Housework by Gender",
      xlab = "Minutes spent",
      breaks = seq(from = 0, to = 400, by = 20))

Note that the exclusions are now applied to both the housework time and gender variables, eliminating the mismatches in the data.

The correct plot has been pasted below. Thanks again to all for the guidance.

Updated Histogram

Upvotes: 0

fdetsch
fdetsch

Reputation: 5308

The problem is related with differing values in panel.args.common(i.e., the arguments common to all the panel functions, see ?trellis.object). Here is some sample code to clarify my point.

library(lattice)

## paneled plot
hist1 <- histogram( ~ Sepal.Width | Species, data = iris)
hist1$panel.args.common

# $breaks
# [1] 1.904 2.228 2.552 2.876 3.200 3.524 3.848 4.172 4.496
# 
# $type
# [1] "percent"
#
# $equal.widths
# [1] TRUE
# 
# $nint
# [1] 8

## single plot    
hist2 <- histogram( ~ Sepal.Width, data = iris[iris$Species == "setosa", ])
hist2$panel.args.common

# $breaks
# [1] 2.216 2.540 2.864 3.188 3.512 3.836 4.160 4.484
# 
# $type
# [1] "percent"
# 
# $equal.widths
# [1] TRUE
# 
# $nint
# [1] 7

nint (number of histogram bins, see ?histogram) and breaks (breakpoints of the bins) are calculated across all target panels, and therefore vary between hist1 and hist2. If you want these arguments to be identical so that the two plots look similar, you just have to run the following line of code after the two plots have been created.

hist2$panel.args.common <- hist1$panel.args.common
## or vice versa, depending on the number of bins and breakpoints to use

library(gridExtra)
grid.arrange(hist1, hist2, ncol = 2)

histogram

Upvotes: 2

Related Questions