Sophie Allan
Sophie Allan

Reputation: 87

Delete missing values detected by ggplot() in R

I asked this question to know how it is possible to plot many graphs in the same plot. Following to the answer which I liked and accepted, it is possible to use ggplot() function.

Now using ggplot(), I receive the following message which notifies that there are missing values were deleted for the plot:

Warning message:
Removed 33 row(s) containing missing values (geom_path).

From the produced plot and visualizing, I am satisfied with data after that ggplot() removed the 33 rows.

I know how to delete rows of NA but here I don't understand if ggplot() deleted rows where there exist NA for at least one variable OR removed rows where all variables are NA, knowing that I have 7 variables and there are some rows where all variables are completely NA while many rows contain NA for only some variables.

Question: Although the rows are already deleted for the plot, how it is possible to remove these rows "the detected 33 rows" completely from data?

Upvotes: 1

Views: 2662

Answers (2)

zx8754
zx8754

Reputation: 56054

ggplot removes rows with NA for columns that are used as input aes to ggplot, if input is x and y columns, but dataframe has y column as well, it will only drop rows if x or y has NA.

Here is an example:

library(ggplot2)

x <- head(mtcars)

# add NA to some column we don't use for ggplot
x$am[ 1 ] <- NA

ggplot(x, aes(cyl, mpg)) + geom_point()
# no warnings

# now add NA to column that we use for plotting
x$cyl[ 1 ] <- NA

ggplot(x, aes(cyl, mpg)) + geom_point()
# Warning message:
#   Removed 1 rows containing missing values (geom_point). 

# to avoid that warning, we can explicitly set it to remove NA
ggplot(x, aes(cyl, mpg)) + geom_point(na.rm = TRUE)
# no warnings

To remove rows from the data, check if the selected columns have NA:

x_clean <- x[ !(is.na(x$cyl) | is.na(x$mpg)), ]
ggplot(x_clean , aes(cyl, mpg)) + geom_point()
# no warnings

Edit 1: To apply to your data based on comments, try below, see filter:

Data <- bind_rows(...)
Data %>%
  mutate(data = paste0('Data',data)) %>%
  pivot_longer(-c(data,Time)) %>%
  filter(!(is.na(Time) | is.na(value))) %>% 
  ggplot(aes(x = factor(Time), y =value), group = name, color = name))+
  geom_line()+
  facet_wrap(.~data,scales = 'free', ncol = 1) +
  xlab('Time')

Edit 2: To "know" what data is going into ggplot why not keep filtered clean data as a separate object instead of piping, see:

Data <- bind_rows(...)
cleanData <- Data %>% 
  mutate(data = paste0('Data',data)) %>%
  pivot_longer(-c(data,Time)) %>%
  filter(!(is.na(Time) | is.na(value)))
  
ggplot(cleanData, aes(x = factor(Time), y =value), group = name, color = name)+
  geom_line()+
  facet_wrap(.~data,scales = 'free', ncol = 1) +
  xlab('Time')

Upvotes: 1

Ben Norris
Ben Norris

Reputation: 5747

Those rows could have NA values, or they could be out of bounds of the axis limits you set. ggplot() generates the same warning in both cases. Here is an example of the latter.

This is the built-in mtcars data set. Notice that there are no missing values:

mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

If I build the following plot, I get the ggplot warning about rows with missing values.

library(ggplot2)
ggplot(mtcars, aes(x = wt, y = qsec)) + 
  geom_point() +
  scale_x_continuous(limits = c(2, 4)) +
  scale_y_continuous(limits = c(16, 22))
Warning message:
Removed 14 rows containing missing values (geom_point).

enter image description here

The 14 rows with "missing values" are the 14 rows with data out of bounds of the axis limits. Here they are.

library(dplyr)
mtcars %>%
  filter(wt < 2 | wt > 4 | qsec < 16 | qsec > 22)
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

Before attempting to remove "missing values" from your data, check to see if your plotting parameters exclude some of the data.

Upvotes: 0

Related Questions