Jake L
Jake L

Reputation: 1067

Filter a piped df within ggplot

I am using a dplyr pipeline to clean my df then feed directly into a ggplot. However, I want to plot only one group at a time, so I need to filter to just that group. The problem is, I want the scales to remain constant as if all groups are present. Is it possible to further filter a piped df inside the ggplot() commands? Ex below.

# create df
set.seed(1)
df <- data.frame(matrix(nrow=100,ncol=5)) 
colnames(df) <- c("year","group","var1","var2","var3") 
df$year <- rep(1:4,each=25)
df$group <- rep(c("a","b","c","d","e"),times=20)
df$var1 <- runif(100,min=0,max=30)
df$var2 <- sample(1:500,100,replace=T) 
df$var2[1:25] <- sample(1:100,25,replace = T)
df$var3 <- runif(100,min=0,max=100)

Now pipe it to clean it (here we're just doing some random stuff to it), then plot:

df %>%
  filter(var3 < 80) %>%   # random thing 1 - filter some stuff
  filter(var2 < 400) %>%   # random thing 2 - filter more
  mutate(var2 = as.numeric(var2)) %>%  # random thing 3 - mutate a column
  ggplot(aes(x=group,y=var1,color=var2)) + 
  geom_point()

So I want to only plot one year at a time (from the "year" column), but I want to do it in a way in which I can plot each year in a loop, but keep the colorbar scaled to the full df values.

Here's what I tried so far :

dlist <- c(1:4)   #list of years
i <- 2    #current year

df %>%
  filter(var3 < 80) %>%
  filter(var2 != 56) %>%
  mutate(var2 = as.numeric(var2)) %>%
  filter(year %in% dlist[i]) %>%   # so I can filter for year here, but that makes the colorbar in the ggplot scale for this subset individually, which is no good. 
  ggplot(aes(x=group,y=var1,color=var2)) + 
  geom_point()

I think there should be a way to use . and %>% within the ggplot parentheses so that the scale remains... but I can't quite figure it out.

dlist <- c(1:4)   #list of years
i <- 2    #current year

df %>%
  filter(var3 < 80) %>%
  filter(var2 != 56) %>%
  mutate(var2 = as.numeric(var2)) %>%
  ggplot(data = .%>%filter(year %in% dlist[i]), aes(x=group,y=var1,color=var2)) + 
  geom_point()

but that gives me this error:

Error: You're passing a function as global data.
Have you misspelled the `data` argument in `ggplot()`

What is the best way to do this?

Upvotes: 4

Views: 1815

Answers (2)

Jon Spring
Jon Spring

Reputation: 66480

You might plot one layer invisibly and then a filtered layer using data = . %>% filter(...:

df %>%
  filter(var3 < 80) %>%
  filter(var2 != 56) %>%
  mutate(var2 = as.numeric(var2)) %>%
  ggplot(aes(x=group,y=var1,color=var2)) + 
  geom_point(alpha = 0) +
  geom_point(data = . %>% filter(year %in% dlist[i]))

enter image description here

Upvotes: 11

dc37
dc37

Reputation: 16178

You can use scale_color_gradient and set the limits of your scale:

df %>%
    filter(var3 < 80 & var2 != 56) %>%
    mutate(var2 = as.numeric(var2)) %>%
    filter(year %in% dlist[i]) %>%   # so I can filter for year here, but that makes the colorbar in the ggplot scale for this subset individually, which is no good. 
    ggplot(aes(x=group,y=var1,color=var2)) + 
    geom_point()+
    scale_color_gradient(limits = c(min(df$var2),max(df$var2)))

Upvotes: 2

Related Questions