drmariod
drmariod

Reputation: 11762

replacing NAs by value but excluding from geom_smooth

I am trying to make a scatter plot and also plot a regression line for my data.

Before plotting, I want to have the NAs replaced by a fixed number to get all points in my graph and since they are all on one line, they are easily visible...

But in this way it messes up my geom_smooth. Is there a better solution to get the missing values replaced by a fixed number but the geom_smooth without the NAs?

set.seed(1234)
df <- data.frame(x=rnorm(100),
                 y=c(rnorm(40), rep(NA,60)))
df[is.na(df)] <- -5
ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method="lm", fullrange=TRUE)

As you can see in the example, the smooth line moves to the "imputed" values.

Upvotes: 1

Views: 1681

Answers (1)

juba
juba

Reputation: 49033

One way to do it is to store your data into two different data frames :

df2 <- df
df2[is.na(df2)] <- -5

And plot them into two different layers :

ggplot() + geom_point(data=df2, aes(x,y)) + geom_smooth(data=df, aes(x,y), method="lm", fullrange=TRUE)

enter image description here

But maybe a cleaner way to do it would be to use geom_rug(), something like this :

dfna <- df[is.na(df$y),]
ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method="lm", fullrange=TRUE) + geom_rug(data=dfna, aes(x))

Which gives :

enter image description here

Upvotes: 5

Related Questions