Reputation: 7063
Is there an option, to get the same result of geom_line
with base R? It feels like this should be easy, but when I tried to understand, what geom_line
is doing (and how), I got lost in the code.
(It should be possible to automate with an arbitrary number of "lines" - not only 2.)
Backround: I would like to display the "two lines from the fit" as in the code below, but I have not been successful. Any ideas?
Reproducible example:
library(ggplot2)
set.seed(1)
sd_age <- 1000
age <- sample(c(20:65), 24)
s_a1 <- 80000 + 100 * age[1:8]
s_a2 <- 70000 + 100 * age[9:24]
df <- data.frame(salary = c(s_a1, s_a2),
dep = c(rep("A1", length(s_a1)),rep("A2", length(s_a2))),
age = c(age[1:8], age[9:24]),
gender = c(0, 1, 0, 1, 0, 1, 0, 1,
1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1),
stringsAsFactors = FALSE)
df$gender <- as.factor(df$gender)
df$dep <- as.factor(df$dep)
df$salary <- df$salary + rnorm(nrow(df), 0, sd_age)
fit2 <- lm(salary ~ age + dep, data = df)
df$fit2 <- predict(fit2)
ggplot(df, aes(x = age, y = salary, shape = dep, colour = gender, fill = dep)) +
geom_point(size = 3) +
xlab("age") +
ylab("salary") +
ggtitle("whatever") +
geom_line(data = df,
mapping = aes(x = age, y = fit2), size = 1.2, color = "blue")
The best I got is
plot(df$age[df$gender == 0], df$salary[df$gender == 0],
xlim = c(18, 67), ylim = c(60000, 100000)) # men
points(df$age[df$gender == 1], df$salary[df$gender == 1],
col = "blue") # women
lines(df$age, df$fit2, col = "blue")
Upvotes: 0
Views: 57
Reputation: 32548
Subset the data for each dep
and then plot it separately
with(df, plot(age, salary,
col = ifelse(gender == 0, "red", "blue"),
pch = ifelse(gender == 0, 19, 15)))
for (grp in unique(df$dep)) {
with(df[df$dep == grp,], lines(sort(age), fit2[order(age)], col = "blue"))
}
Upvotes: 2