JotHa
JotHa

Reputation: 55

Subsetting data for ggplot2

I have data saved in multiple datasets, each consisting of four variables. Imagine something like a data.table dt consisting of the variables Country, Male/Female, Birthyear, Weighted Average Income. I would like to create a graph where you see only one country's weighted average income by birthyear and split by male/female. I've used the facet_grid() function to get a grid of graphs for all countries as below.

ggplot() + 
 geom_line(data = dt,
           aes(x = Birthyear, 
               y = Weighted Average Income,
               colour = 'Weighted Average Income'))+
 facet_grid(Country ~ Male/Female)

However, I've tried isolating the graphs for just one country, but the below code doesn't seem to work. How can I subset the data correctly?

ggplot() + 
 geom_line(data = dt[Country == 'Germany'],
           aes(x = Birthyear, 
               y = Weighted Average Income,
               colour = 'Weighted Average Income'))+
 facet_grid(Country ~ Male/Female)

Upvotes: 0

Views: 1376

Answers (1)

Oliver
Oliver

Reputation: 8572

For your specific case the problem is that you are not quoting Male/Female and Weighted Average Income. Also your data and basic aesthetics should likely be part of ggplot and not geom_line. Doing so isolates these to the single layer, and you would have to add the code to every layer of your plot if you were to add for example geom_smooth.

So to fix your problem you could do

library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'], 
       aes(x = Birthyear, 
           y = sym("Weighted Average Income"),
           col = sym("Weighted Average Income")
       ) + #Could use "`x`" instead of sym(x) 
  geom_line() + 
  facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot

Now ggplot2 actually has a (lesser known) builtin functionality for changing your data, so if you wanted to compare this to the plot with all of your countries included you could do:

plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")

Upvotes: 1

Related Questions