How can I split my density plots and histograms by data containing NA values versus not?

Question

I know how to nicely split density plots by a binary variable (i.e. sex), but I want to compare and overlay density plots comparing data which contains NA values (in a specified column) and data that doesn't.

I have my data and then create subsets:

data_NA <- data[is.na(data$x4), ]
data_notNA <- data[!is.na(data$x4), ]

I then want to create histograms and density plots of the other variables to see how they they are distributed differently in each subset.

What would I add to compare these histograms easily side-by-side for the different subsets?

sex_hist <- ggplot(data = data) + geom_histogram(mapping = aes(x=factor(sex)), stat="count") + scale_x_discrete(labels = c("1" = "Female", "2" = "Male")) + xlab("Sex")

I could just make two and use grid.arrange(), but I was hoping there might be a neater way.

And how would I overlay age density plots for the different data subsets for example:

density_DE_age <- ggplot(data = data, aes(x=age, fill = sex)) + geom_density(alpha = 0.5, position = 'identity'))

(Instead of based on sex)

zephryl · Accepted Answer

Create a variable indicating whether x4 is missing, then facet by it.

data$x4_missing <- is.na(data$x4)

sex_hist <- ggplot(data = data) + 
  geom_histogram(mapping = aes(x=factor(sex)), stat="count") +   
  scale_x_discrete(labels = c("1" = "Female", "2" = "Male")) + \.  
  xlab("Sex") +
  facet_wrap(vars(x4_missing))

density_DE_age <- ggplot(data = data, aes(x=age, fill = sex)) + 
  geom_density(alpha = 0.5, position = 'identity')) +
  facet_wrap(vars(x4_missing))

How can I split my density plots and histograms by data containing NA values versus not?

Answers (1)

Related Questions