Reputation: 105
I am new to visualizing regression results and need help with getting a plot showing predicted values from a linear model regression.
Data structure obtained with dput
:
`> dput(head(dat,4))
structure(list(X = c(809L, 3L, 1L, 2L), cntry = structure(c(1L,
1L, 1L, 1L), .Label = c("AT", "BE", "CH", "CZ", "DE", "DK", "ES",
"FI", "GB", "GR", "HU", "IE", "IL", "NL", "NO", "PL", "PT", "SE",
"SI", "EE", "IS", "LU", "SK", "TR", "UA", "BG", "CY", "FR", "RU",
"HR", "LV", "RO", "LT", "AL", "IT", "XK"), class = "factor"),
ipshabt = c(4L, 2L, 3L, 2L), ipsuces = c(3L, 2L, 3L, 1L),
imprich = c(3L, 3L, 3L, 3L), iprspot = c(3L, 3L, 4L, 4L),
impsafe = c(3L, 3L, 2L, 4L), ipstrgv = c(2L, 2L, 2L, 3L),
ipfrule = c(3L, 2L, 1L, 6L), ipbhprp = c(3L, 2L, 4L, 4L),
ipmodst = c(3L, 3L, 2L, 5L), imptrad = c(2L, 2L, 1L, 6L),
ipeqopt = c(1L, 2L, 1L, 1L), ipudrst = c(1L, 2L, 3L, 3L),
impenv = c(3L, 2L, 2L, 1L), iphlppl = c(1L, 2L, 1L, 4L),
iplylfr = c(2L, 2L, 2L, 3L), ipcrtiv = c(2L, 2L, 2L, 1L),
impfree = c(2L, 3L, 1L, 1L), impdiff = c(2L, 3L, 3L, 1L),
ipadvnt = c(6L, 4L, 3L, 1L), ipgdtim = c(2L, 2L, 1L, 1L),
impfun = c(6L, 5L, 1L, 3L), gndr = c(2L, 2L, 1L, 1L), agea = c(69L,
63L, 54L, 50L), hincfel = c(1L, 2L, 1L, 3L), educ = c(1L,
2L, 3L, 3L), year = c(2002L, 2002L, 2002L, 2002L), Achievement = c(3.5,
2, 3, 1.5), Power = c(3, 3, 3.5, 3.5), Security = c(2.5,
2.5, 2, 3.5), Conformity = c(3, 2, 2.5, 5), Tradition = c(2.5,
2.5, 1.5, 5.5), Universalism = c(1.66666666666667, 2, 2,
1.66666666666667), Benevolence = c(1.5, 2, 1.5, 3.5), SelfDirection = c(2, 2.5, 1.5, 1), Stimulation = c(4, 3.5, 3, 1), Hedonism = c(4,
3.5, 1, 2), SelfEnh = c(3.25, 2.5, 3.25, 2.5), SelfTran = c(1.6,
2, 1.8, 2.4), Cons = c(2.66666666666667, 2.33333333333333,
2, 4.66666666666667), Open = c(3.33333333333333, 3.16666666666667,
1.83333333333333, 1.33333333333333), SelfTranNet = c(-1.65,
-0.5, -1.45, -0.1), OpenNet = c(0.666666666666667, 0.833333333333333,
-0.166666666666667, -3.33333333333333), east = c(0, 0, 0,
0), eastyear = c(0, 0, 0, 0), income = c(1L, 2L, 1L, 3L),
year2002 = c(1, 1, 1, 1), eastyear2002 = c(0, 0, 0, 0), year2004 = c(0,
0, 0, 0), eastyear2004 = c(0, 0, 0, 0), year2006 = c(0, 0,
0, 0), eastyear2006 = c(0, 0, 0, 0), year2008 = c(0, 0, 0,
0), eastyear2008 = c(0, 0, 0, 0), year2010 = c(0, 0, 0, 0
), eastyear2010 = c(0, 0, 0, 0), year2012 = c(0, 0, 0, 0),
eastyear2012 = c(0, 0, 0, 0), year2014 = c(0, 0, 0, 0), eastyear2014 = c(0,
0, 0, 0), year2016 = c(0, 0, 0, 0), eastyear2016 = c(0, 0,
0, 0)), .Names = c("X", "cntry", "ipshabt", "ipsuces", "imprich",
"iprspot", "impsafe", "ipstrgv", "ipfrule", "ipbhprp", "ipmodst",
"imptrad", "ipeqopt", "ipudrst", "impenv", "iphlppl", "iplylfr",
"ipcrtiv", "impfree", "impdiff", "ipadvnt", "ipgdtim", "impfun",
"gndr", "agea", "hincfel", "educ", "year", "Achievement", "Power",
"Security", "Conformity", "Tradition", "Universalism", "Benevolence",
"SelfDirection", "Stimulation", "Hedonism", "SelfEnh", "SelfTran",
"Cons", "Open", "SelfTranNet", "OpenNet", "east", "eastyear",
"income", "year2002", "eastyear2002", "year2004", "eastyear2004",
"year2006", "eastyear2006", "year2008", "eastyear2008", "year2010",
"eastyear2010", "year2012", "eastyear2012", "year2014", "eastyear2014",
"year2016", "eastyear2016"), row.names = c(NA, 4L), class = "data.frame")`
My linear regression model:
> modelAchievement <- lm(Achievement~east+year+year2002+eastyear2002+year2004+eastyear2004+year2006+eastyear2006+year2008+eastyear2008+year2010+eastyear2010+year2012+eastyear2012+year2014+eastyear2014+year2016+eastyear2016+agea+gndr+income+educ, data = dat)
Now, I want to get two predicted lines of dependent variable, i.e. "Achievement", on the same plot with y-axis "Achievement" and x-axis "year". Line one: if dummy variable "east"=1; line two: if dummy variable "east"=0.
I didn't know how to proceed and tried to use ggplot(modelAchievement, aes(y = Achievement, x = year))
, but it gives an empty plot.
Any advice will be greatly appreciated.
Link to full data: data
Upvotes: 2
Views: 6139
Reputation: 13128
Based on your formula, it appears that you're interacting east
with each level of year
, which we can express more compactly as
fit <- lm(Achievement ~ east * factor(year) + agea + gndr + income + educ, data = dat)
To compute predicted outcomes for different values of east
and year
, we first have to define values for the other 4 variables, agea
, gndr
, income
and educ
. I set these values at their sample mean, although you can use any value you want.
library(dplyr)
new_dat <- summarise_at(dat, vars(agea, gndr, income, educ), mean)
# agea gndr income educ
# 1 47.88262 1.536708 2.031206 3.16173
We then combine this dataframe with another dataframe that has all the combinations of east
and year
.
new_dat <- cbind(expand.grid(year = seq(2002, 2016, 2), east = 0:1), new_dat)
new_dat
# year east agea gndr income educ
# 1 2002 0 47.88262 1.536708 2.031206 3.16173
# 2 2004 0 47.88262 1.536708 2.031206 3.16173
# 3 2006 0 47.88262 1.536708 2.031206 3.16173
# 4 2008 0 47.88262 1.536708 2.031206 3.16173
# 5 2010 0 47.88262 1.536708 2.031206 3.16173
# 6 2012 0 47.88262 1.536708 2.031206 3.16173
# 7 2014 0 47.88262 1.536708 2.031206 3.16173
# 8 2016 0 47.88262 1.536708 2.031206 3.16173
# 9 2002 1 47.88262 1.536708 2.031206 3.16173
# 10 2004 1 47.88262 1.536708 2.031206 3.16173
# 11 2006 1 47.88262 1.536708 2.031206 3.16173
# 12 2008 1 47.88262 1.536708 2.031206 3.16173
# 13 2010 1 47.88262 1.536708 2.031206 3.16173
# 14 2012 1 47.88262 1.536708 2.031206 3.16173
# 15 2014 1 47.88262 1.536708 2.031206 3.16173
# 16 2016 1 47.88262 1.536708 2.031206 3.16173
We then use predict
to compute predicted outcomes for this new dataset:
new_dat$predicted <- predict(fit, new_dat)
Now we can plot
library(ggplot2)
ggplot(new_dat, aes(x = year, y = predicted, colour = factor(east), group = east)) +
geom_line()
Upvotes: 3