Reputation: 5540
I have a data frame in R
as follows:
gen pos count
A 1 10
A 2 20
A 3 15
A 4
...
B 1 50
B 2 30
B 3
B 4 40
...
The data frame contains ~30000 rows. Values for count
are intentionally missing in ~300 rows. I plot these data with:
ggplot(data=d, aes(x=pos, y=count, group=gen, colour=gen)) + geom_line()
The missing data points are absent on the plot, which is what I want. I am happy with the plot.
However, ggplot
returns the following warning:
Removed 2 rows containing missing values (geom_path).
If there are ~300 missing values (for count
; there are no missing values for gen
or pos
), why is ggplot
reporting only 2?
Upvotes: 3
Views: 922
Reputation: 31171
Take a simple example:
df = data.frame(gen=rep(letters[1:3],each=6),
y=c(NA,2,5,6,NA,8,9,NA,1,2,3,1,4,3,6.5,4.2,1,NA),
x=rep(1:6,3))
ggplot(df, aes(x=x, y=y, colour=gen)) + geom_line()
And we have the warning:
Warning message:
Removed 2 rows containing missing values (geom_path).
By looking at the graph below, we see that:
(6,8)
is absent despite it has no NA. We notice it is an 'isolated point' (cannot link it to the previous value (5, NA)
which has NA
).NA
. It is also an isolated point, cannot be linked to the next value since it has NA
.NA
.Hence the warning message just gives an indication of how many regular (not NA) but isolated points are removed from the graph. Here 2
.
Upvotes: 5