Reputation: 2341
I have data as follows, for which I run ggplot
code below:
data <- structure(list(country_mean_rep = structure(c(73.6995708154506,
93.5501285347044, 85.1529051987768, 91.1017369727047, 79.5562130177515,
84.6751054852321, 89.8, 86.8826405867971, 94.2247191011236, 70.2321428571429,
88.4107142857143), label = "label", format.stata = "%9.2f"),
country_mean_crime = c(0.0944206008583691, 0.0565552699228792,
0.0336391437308868, 0.205955334987593, 0.130177514792899,
0.282700421940928, 0.220512820512821, 0.415647921760391,
0.387640449438202, 0.200892857142857, 0.292207792207792),
country_name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 11L, 12L,
14L, 16L, 20L), .Label = c("Albania", "Armenia", "Azerbaijan",
"Belarus", "Bosnia and Herzegovina", "Brazil", "Bulgaria",
"Cambodia", "Chile", "CostaRica", "Croatia", "Czech", "Ecuador",
"Estonia", "FYROM", "Georgia", "Germany", "Greece", "Guyana",
"Hungary", "Ireland", "Kazakhstan", "Kenya", "Kyrgyzstan",
"Latvia", "Lithuania", "Malawi", "Mali", "Moldova", "Philippines",
"Poland", "Portugal", "Romania", "Russia", "Senegal", "Serbia&Montenegro",
"Slovakia", "Slovenia", "South Africa", "South Korea", "Spain",
"SriLanka", "Tajikistan", "Turkey", "Ukraine", "Uzbekistan",
"Vietnam"), class = "factor")), row.names = c(NA, -11L), class = c("data.table",
"data.frame"))
# On which I like to run the following code:
ggplot(data, aes(x=country_mean_rep, y=country_mean_crime)) +
geom_point() +
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_text(aes(label=country_name), nudge_y=0.02) +
theme_bw()
Now let's say that the Czech Republic is an outlier, which I want to remove for the fits I am doing (especially the linear one). Please note that I understand there is nothing wrong with the Czech Republic in the example, I need to know this for a proper outlier in my actual data.
Is there some way of excluding it only from the fit, while keeping the dot in the plot?
Upvotes: 4
Views: 1297
Reputation: 76402
Here is a way.
Start the plot with the subset of the data that excludes "Czech"
. And only use the entire data set for the data
argument of geom_point
. Like this the point "Czech"
will be plotted but excluded from the fits.
In fact, excluded from everything else. So if you want the "Czech"
label you will have to also use data = data
(the full data set) in geom_text
.
library(ggplot2)
ggplot(data = subset(data, country_name != "Czech"), aes(x=country_mean_rep, y=country_mean_crime)) +
geom_point(data = data)
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_text(aes(label=country_name), nudge_y=0.02) +
theme_bw()
Upvotes: 2
Reputation: 11255
One way to do it would be to include different data plots:
ggplot(subset(data, country_name != 'Czech'), aes(x=country_mean_rep, y=country_mean_crime)) +
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_point(data = data, inherit.aes = FALSE, aes(x = country_mean_rep, y = country_mean_crime)) +
geom_text(data = data, aes(label=country_name, x = country_mean_rep, y = country_mean_crime), inherit.aes = FALSE, nudge_y=0.02) +
theme_bw()
In this case, the 3 linear models use the subsetted data whereas the calls to geom_point
and geom_text
do not inherit the original aestetics.
Upvotes: 2