Forklift17
Forklift17

Reputation: 2477

How to overlap R histograms

Reproduced from this code:

library(haven)
library(survey)
library(dplyr)

nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA

nhanesAnalysis <- nhanesDemo %>%
  mutate(LowIncome = case_when(
    INDFMIN2 < 40 ~ T,
    T ~ F
  )) %>%
  # Select the necessary columns
  select(INDFMIN2, LowIncome, persWeight, psu, strata)

# Set up the design
nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)

svyhist(~log10(INDFMIN2), design=nhanesDesign, main = '')

enter image description here

How do I color the histogram by independent variable, say, LowIncome? I want to have two separate histograms, one for each value of LowIncome. Unfortunately I picked a bad example, but I want them to be see-through in case their values overlap.

Upvotes: 0

Views: 142

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173803

If you want to plot a histogram from your model, you can get its data from model.frame (this is what svyhist does under the hood). To get the histogram filled by group, you could use this data frame inside ggplot:

library(ggplot2)

ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
  geom_histogram(alpha = 0.5, color = "gray60", breaks = 0:20 / 10) +
  theme_classic() 

enter image description here


Edit

As Thomas Lumley points out, this does not incorporate sampling weights, so if you wanted this you could do:

ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
  geom_histogram(aes(weight = persWeight), alpha = 0.5, 
                 color = "gray60", breaks = 0:20 / 10) +
  theme_classic() 

enter image description here

To demonstrate this approach works, we can replicate Thomas's approach in ggplot using the data example from svyhist. To get the uneven bin sizes (if this is desired), we need two histogram layers, though I'm guessing this would not be required for most use-cases.

ggplot(model.frame(dstrat), aes(enroll)) +
  geom_histogram(aes(fill = "E", weight = pw, y = after_stat(density)),
                 data = subset(model.frame(dstrat), stype == "E"),
                 breaks = 0:35 * 100,
                 position = "identity", col = "gray50") +
  geom_histogram(aes(fill = "Not E", weight = pw, y = after_stat(density)),
                 data = subset(model.frame(dstrat), stype != "E"),
                 position = "identity", col = "gray50",
                 breaks = 0:7 * 500) +
  scale_fill_manual(NULL, values = c("#00880020", "#88000020")) +
  theme_classic()

enter image description here

Upvotes: 3

Thomas Lumley
Thomas Lumley

Reputation: 2765

You can't just extract the data and use ggplot, because that won't use the weights and so misses the whole point of svyhist. You can use the add=TRUE argument, though. You do need to set the x and y axis ranges correctly to make sure the whole plot is visible

Using the data example from ?svyhist

svyhist(~enroll, subset(dstrat,stype=="E"), col="#00880020",ylim=c(0,0.003),xlim=c(0,3500))
svyhist(~enroll, subset(dstrat,stype!="E"), col="#88000020",add=TRUE)

enter image description here

Upvotes: 1

Related Questions