Reputation: 2807
I would like to illustrate the difference in regression between pooling data (ignoring groups) and controlling for groups. Data comes from http://www.jblumenstock.com/files/courses/econ174/FEModels.pdf
paneldata <- data.frame(Location = c("Chicago", "Chicago", "Peoria", "Peoria", "Milwaukee", "Milwaukee", "Madison", "Madison"),
Year = rep(2003:2004, 4),
Price = c(75, 85, 50, 48, 60, 65, 55, 60),
Quantity = c(2.0, 1.8, 1.0, 1.1, 1.5, 1.4, 0.8, 0.7))
Since geom_smooth
automatically picks the aes I can either do a line for all data points or for groups. But I would like to have both in the same diagram.
library(ggplot2)
library(gridExtra)
plot_pool <- ggplot(paneldata, aes(x=Price, y=Quantity)) +
geom_point(aes(colour = Location)) +
labs(title="Relationship between Price and Quantity",
x="Price", y="Quantity") +
geom_smooth(method = "lm", se = FALSE)
plot_groups <- ggplot(paneldata, aes(x=Price, y=Quantity, colour = Location)) +
geom_point() +
labs(title="Relationship between Price and Quantity",
x="Price", y="Quantity") +
geom_smooth(method = "lm", se = FALSE)
grid.arrange(plot_pool, plot_groups, ncol=2)
Upvotes: 0
Views: 626
Reputation: 721
You just need to add another geom_smooth()
with different aesthetics:
ggplot(paneldata, aes(x=Price, y=Quantity)) +
geom_point() +
labs(title="Relationship between Price and Quantity",
x="Price", y="Quantity") +
geom_smooth(aes(colour = Location), method = "lm", se = FALSE) +
geom_smooth(method = "lm", se = FALSE)
Upvotes: 3