SabreWolfy
SabreWolfy

Reputation: 5540

Number of missing values in ggplot

I have a data frame in R as follows:

gen    pos    count
A      1      10
A      2      20
A      3      15
A      4      
...
B      1      50
B      2      30
B      3      
B      4      40
...

The data frame contains ~30000 rows. Values for count are intentionally missing in ~300 rows. I plot these data with:

ggplot(data=d, aes(x=pos, y=count, group=gen, colour=gen)) + geom_line()

The missing data points are absent on the plot, which is what I want. I am happy with the plot.

However, ggplot returns the following warning:

Removed 2 rows containing missing values (geom_path). 

If there are ~300 missing values (for count; there are no missing values for gen or pos), why is ggplot reporting only 2?

Upvotes: 3

Views: 922

Answers (1)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

Take a simple example:

df = data.frame(gen=rep(letters[1:3],each=6),
                y=c(NA,2,5,6,NA,8,9,NA,1,2,3,1,4,3,6.5,4.2,1,NA),
                x=rep(1:6,3))

ggplot(df, aes(x=x, y=y, colour=gen)) + geom_line()

And we have the warning:

Warning message:
Removed 2 rows containing missing values (geom_path). 

By looking at the graph below, we see that:

  • For group a, the last point with coordinate (6,8) is absent despite it has no NA. We notice it is an 'isolated point' (cannot link it to the previous value (5, NA) which has NA).
  • For group b, the first point is absent despite it has no NA. It is also an isolated point, cannot be linked to the next value since it has NA.
  • For group c, the last point is absent but this is correct since it has an y value which is NA.

Hence the warning message just gives an indication of how many regular (not NA) but isolated points are removed from the graph. Here 2.

enter image description here

Upvotes: 5

Related Questions