Ben Eichler
Ben Eichler

Reputation: 491

Eliminating NAs from a ggplot

Very basic question here as I'm just starting to use R, but I'm trying to create a bar plot of factor counts in ggplot2 and when plotting, get 14 little colored blips representing my actual levels and then a massive grey bar at the end representing the 5000-ish NAs in the sample (it's survey data from a question that only applies to about 5% of the sample). I've tried the following code to no avail:

ggplot(data = MyData,aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin") 

The addition of the na.rm argument here has no apparent effect.

meanwhile

ggplot(data = na.omit(MyData),aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin") 

gives me

"Error: Aesthetics must either be length one, or the same length as the data"

as does affixing the na.omit() to the_variable, or both MyData and the_variable.

All I want to do is eliminate the giant NA bar from my graph, can someone please help me do this?

Upvotes: 49

Views: 257537

Answers (7)

Quinten
Quinten

Reputation: 41265

Another option is using the function complete.cases like this:

library(ggplot2)
# With NA
ggplot(airquality, aes(x = Ozone))+
  geom_bar(stat="bin")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 37 rows containing non-finite values (stat_bin).

# Remove NA using complete.cases
airquality_complete=airquality[complete.cases(airquality), ]
ggplot(airquality_complete, aes(x = Ozone))+
  geom_bar(stat="bin")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2022-08-25 with reprex v2.0.2

Upvotes: 0

Bryan F
Bryan F

Reputation: 950

Try remove_missing instead with vars = the_variable. It is very important that you set the vars argument, otherwise remove_missing will remove all rows that contain an NA in any column!! Setting na.rm = TRUE will suppress the warning message.

ggplot(data = remove_missing(MyData, na.rm = TRUE, vars = the_variable),aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
       geom_bar(stat="bin") 

Upvotes: 12

regents
regents

Reputation: 626

Additionally, adding na.rm= TRUE to your geom_bar() will work.

ggplot(data = MyData,aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin", na.rm = TRUE)

I ran into this issue with a loop in a time series and this fixed it. The missing data is removed and the results are otherwise uneffected.

Upvotes: 27

rafa.pereira
rafa.pereira

Reputation: 13807

You can use the function subset inside ggplot2. Try this

library(ggplot2)

data("iris")
iris$Sepal.Length[5:10] <- NA # create some NAs for this example

ggplot(data=subset(iris, !is.na(Sepal.Length)), aes(x=Sepal.Length)) + 
geom_bar(stat="bin")

Upvotes: 61

JKao
JKao

Reputation: 121

Not sure if you have solved the problem. For this issue, you can use the "filter" function in the dplyr package. The idea is to filter the observations/rows whose values of the variable of your interest is not NA. Next, you make the graph with these filtered observations. You can find my codes below, and note that all the name of the data frame and variable is copied from the prompt of your question. Also, I assume you know the pipe operators.

library(tidyverse) 

MyDate %>%
   filter(!is.na(the_variable)) %>%
     ggplot(aes(x= the_variable, fill=the_variable)) + 
        geom_bar(stat="bin") 

You should be able to remove the annoying NAs on your plot. Hope this works :)

Upvotes: 12

ikashnitsky
ikashnitsky

Reputation: 3111

Just an update to the answer of @rafa.pereira. Since ggplot2 is part of tidyverse, it makes sense to use the convenient tidyverse functions to get rid of NAs.

library(tidyverse)
airquality %>% 
        drop_na(Ozone) %>%
        ggplot(aes(x = Ozone))+
        geom_bar(stat="bin")

Note that you can also use drop_na() without columns specification; then all the rows with NAs in any column will be removed.

Upvotes: 33

Isis Costa
Isis Costa

Reputation: 1

From my point of view this error "Error: Aesthetics must either be length one, or the same length as the data" refers to the argument aes(x,y) I tried the na.omit() and worked just fine to me.

Upvotes: 0

Related Questions