Reputation: 3256
I have a continuous scale including some values which codify different categories of missing (for example 998,999
), and I want to make a plot excluding these numeric missing values.
Since the values are together, I can use xlim
each time, but since it determines the domain of the plot I have to change the values for each different case.
Then, I ask for a solution. I think in two possibilities.
xlim
does.xlim
?, meaning that the range determined by the limits (or a discrete set of values given) won't be included in the x-axis.Thanks in advance.
Upvotes: 0
Views: 846
Reputation: 13680
I would filter those missing values from the original dataset:
library(dplyr)
df <- data.frame(cat = rep(LETTERS[1:4], 3),
values = sample(10, 12, replace = TRUE)
)
# Add missing values
df$values[c(1,5,10)] <- 999
df$values[c(2,7)] <- 998
invalid_values <- c(998, 999)
library(ggplot2)
df %>%
filter(!values %in% invalid_values) %>%
ggplot() +
geom_point(aes(cat, values))
Alternatively, if that's not possible for some reason, you can define a scale transformation:
df %>%
ggplot() +
geom_point(aes(cat, values)) +
scale_y_continuous(trans = scales::trans_new('remove_invalid',
transform = function(d) {d <- if_else(d %in% invalid_values, NA_real_, d)},
inverse = function(d) {if_else(is.na(d), 999, d)}
)
)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 5 rows containing missing values (geom_point).
Created on 2018-05-09 by the reprex package (v0.2.0).
Upvotes: 1
Reputation: 621
I think the simplest way is to exclude these values in the plot, either before or during the ggplot call.
library(tidyverse)
# Create data with overflowing data
mtcars2 <- mtcars
mtcars2[5:15, 'mpg'] <- 998
# Full plot
mtcars2 %>% ggplot() +
geom_point(aes(x = mpg, y = disp))
mtcars2 %>%
filter(mpg < 250) %>%
ggplot() +
geom_point(aes(x = mpg, y = disp))
mtcars2 %>%
ggplot() +
geom_point(aes(x = mpg, y = disp), data = . %>% filter(mpg < 250))
Upvotes: 2