Reputation: 25
I am trying to make a partial regression plot of a linear model, on which the y-axis shows the actual values vs x-axis but the displayed trend line is from the predict() function. As it is, my plot is showing jagged confidence intervals. How can I fix this? My model is looking at the effect of a metric of sleep disturbance on gene expression, controlling for age, sex, and education.
This is what I've done:
fit1<-lm(formula = (DF[, "gene.expression"]) ~ DF[, "sleep.metric"] +
DF[, "age_death"] + DF[, "msex"] + DF[, "educ"])
pred<-fit1[["terms"]][[3]][[2]][[2]][[2]][[4]]
outcome<-fit1[["terms"]][[2]][[2]][[4]]
covars_list<-list(c('age_death', 'msex', 'educ'))
n<-length(fit1[["residuals"]])
b1slope<-coef(fit1)[2] #beta of sleep
x_var<-eval(parse(text=paste('DF$',pred,'[!is.na(DF$',pred,')]'))#values of sleep
b0intercept<-coef(fit1)['(Intercept)']
resid<-fit1$resid
y.fit<-fit1$fitted.values
#add back residuals to fitted vals to get actual y
y.adj<-b1slope*x+b0intercept + resid #gene expression
Since my data is scaled and centered, I make a new data frame containing values of sleep.metric and 0's for all other variables
covars_list<-list(c('age_death', 'msex', 'educ'))
terms<-c(pred, outcome, unlist(covars_list))
txt<-paste0('c("',paste(terms, collapse='","'),'")')
nf<-paste0('DF[,',txt,']', sep="")
nf<-eval(parse(text=nf))
nf[,2:dim(nf)[2]]=0
head(nf)
sleep.metric gene.exp age_death msex educ
45 0.07818673 0 0 0 0
81 0.13795502 0 0 0 0
131 -0.34721989 0 0 0 0
132 -0.74577821 0 0 0 0
189 -0.13113761 0 0 0 0
190 0.25137619 0 0 0 0
Then I predict the model on the new data frame:
p<-predict(fit1, newdata=data.frame(nf), se.fit=F, interval='confidence',level = 0.95)
p<-na.omit(p)
x<-x[order(x)]
y.adj<-y.adj[order(x)]
p<-p[order(x),]
fit<-p[,1]
lower<-p[,2]
upper<-p[,3]
plot(y.adj~x,
pch=16, col='gray', cex=0.5,
xlab="A metric of sleep distrubances \n adjusted for Gene expression, Age, Sex, and Education",
ylab="Adjusted Normalized z-score of Differentially Expressed Genes")
lines(fit~x,col="blue", lty=1)
lines(upper~x, col="lightblue", lty=2)
lines(lower~x, col="lightblue", lty=2)
This is what the plot looks like. What is causing the error and how can I fix it?
Reproducible data:
structure(list(sleep.metric = structure(c(0.0781867305884893,
0.137955020341975, -0.347219890225737, -0.745778209248931, -0.131137608705511,
0.251376191355564, -0.217902786210877, 1.76686965616634, -0.881129345174544,
-0.338245867206808, -0.756229888108794, 0.0798782710429454, -0.0564623211268888,
0.813854115467372, -0.587134441516757, 0.488293753134617, -0.983792312139819,
-0.38088203036309, 2.34506200421004, -0.398055018083532, 0.584613714614867,
0.176497570138307, 1.2536702229551, 1.64369019030153, 0.133885021086697,
-0.0361346005839536, -1.06575324210881, 1.00416057707554, -0.259015192681828,
-0.66924314057741, -0.546578627299251, -0.0280695749058621, -0.75954200307115,
-0.0358146295855893, 0.760677060077987, -0.326347340016168, -0.00983890269857277,
-0.127636678678207, 3.21535638854174, 0.411652347522696, 0.245305372059335,
-0.189423100931081, -0.94097131700755, 0.185616847672074, 0.574133270604647,
-0.971249603139539, -0.956920319730864, 1.3523792974609, -0.421942384264533,
-0.32340367612516, -0.421516969279265, 1.16056079266555, -1.02172158952876,
-0.24581750410861, 0.43187137567648, 2.27715346247963, -0.290347044060011,
-0.441085933096158, -1.09302649309957, -0.630833584279842, 0.241891944642241,
0.480525029463393, -0.884455035734616, 0.443127110103121, -0.941735119739369,
-0.22913662262526, -0.874299645910037, -0.480905338580742, -0.126670983273721,
0.3954498094977, -0.560048458910742, 2.89532008524038, 0.0248226778272163,
-1.01121184097596, -0.591890853074113, -0.0886937889251264, 0.12688808328598,
-0.755013065462159, -0.314085327615274, -0.69410710727193, -0.431930508495868,
-0.270739389849264, -0.958762712357257, 0.592175108187459, 0.00116097605735023,
0.353629682458558, 0.903042354370769, 3.9080196959324, -0.206449868267859,
-0.807964781108941, -1.29017423180506, -0.716669021061946, 2.31801500883332,
0.178197079517556, -0.246897299441486, 0.917814572819545, 2.00874768622414,
-0.348801897152664, -0.329474505063542, -1.14930279739887), dim = c(100L,
1L), dimnames = list(NULL, "sleep.metric")), gene.exp = c(0.275120454231549,
0.0419280830612593, 0.722379194844519, -1.39609332627313, 0.551239392320064,
0.97901873056573, -0.399674889566337, 0.979173460441012, -0.810313122567038,
0.00876336589549858, -2.19209451115226, 0.134846104871551, 0.256946914794601,
0.492550391524167, -0.853320990740947, -0.0120410043132865, -0.202692199045417,
0.103227281410324, -0.177077954734572, 2.15979763395388, 0.686134433829806,
0.581770468764468, -0.806179875680857, 1.20352295985308, -0.945640461330586,
0.0319206924653319, 1.09556492542714, -0.0255184411880729, 0.025466550076095,
1.09369718694631, -0.088756079965307, -0.622996922654691, 0.0504432952149795,
2.54464950368199, -2.55630250659539, -0.276977013433171, -0.111992592793995,
0.750456816756037, 0.202195235510047, 1.46146761878122, 0.514227599580235,
-0.171147260973635, -2.21759420187366, 2.9150319169828, 0.573452729363554,
-0.0143524691604244, -1.04351802753689, 0.506067507086837, 0.453760934645183,
1.95816151030286, -1.23933627256704, 0.517438047690363, 0.626946766679823,
0.656383131452248, -0.0316953893592668, 0.867615498212971, 0.486984831899531,
-0.393734728078153, 1.20924323094161, -0.238606524540738, 1.64298430745229,
-0.523702493683108, 0.113874319403116, -2.23148573223598, 0.187135218506304,
-0.682949228687968, 0.125437770480395, 1.79521391805445, 0.849704034542079,
-0.199298587988189, -0.549537473142907, -0.212745098371464, -1.2161104070599,
-1.7282501488871, 0.910264861757878, 0.786451422997723, 0.521099005840089,
-0.15115309548354, -0.312817218722264, -0.62314864970678, -2.803984993633,
0.53849755359447, 1.41013187683686, -0.37653130943696, -1.1304906604667,
-0.741354535617685, 1.39141157265084, 0.393506025968128, -0.801363230280356,
-0.426166102849928, -0.293413062589994, 0.290205552678512, -1.76810792224511,
-0.508942288526144, 0.299288361499007, -0.551341112055747, -1.17759267343941,
0.334004331904178, 0.333270055689333, 0.835918788892886), age_death = structure(c(0.625453432854603,
-1.67366014838442, -0.091366588904174, 0.372377661813803, 1.16717238059596,
1.47405525250668, -0.0193197928240909, -0.229076287740785, 0.41706491507867,
0.480447855933921, -0.710148248908413, 0.749483360283598, 0.345018118998587,
0.47862388641291, 0.462664153104036, -1.02067905986115, 0.163989144037886,
0.00758375761087674, 0.936896228567856, -0.296107167638072, 0.342282164717062,
0.300786858113982, 0.0440631480311606, 1.75357858160223, -0.343530375184464,
1.08600573691081, 0.501879497805858, -0.602534047168543, 0.108358073646939,
1.92731167847883, 0.210044374443498, 0.427096747444251, 2.12065244770638,
-1.44520796587732, -0.814114511606248, 0.979303519931451, -0.329394611396599,
1.34637738603565, 1.43073597638257, -0.653605193756955, 0.445792435034647,
0.386057433221419, -0.147453651675369, -0.970063905653007, 0.279355216242061,
-0.0489592975405808, 0.90953668575264, -0.818218443028532, 0.97337561898814,
0.322674492366151, 0.0896623860565266, -1.48077537153711, -0.284251365751481,
0.0271914299617715, 1.02034283415428, -0.806362641141927, 1.00438310084539,
0.387881402742428, -0.19670082874277, 0.0586549041992897, -0.100030444128994,
-2.4474792176749, -1.72564327973334, -0.189404950658714, 0.276163269580288,
-0.159765445942222, -0.539607098693542, 0.799642522111495, -0.00609601379673946,
0.246523764863798, -0.342162398043703, 0.915920579076183, -2.37224047493305,
-0.658165117559489, 0.858921531544477, 1.12157314257059, 0.359153882786451,
0.340458195196053, 0.719843855567105, 0.752675306945368, 0.322218499985903,
-2.51724605185371, -0.234092203923567, -0.633085536645531, 0.286651094326117,
1.02718271985807, -0.0288956328094213, -0.540519083454038, 0.322218499985903,
-2.46435093574429, 1.59398124851332, 0.204116473500205, -0.091366588904174,
-1.31616212226553, -0.105958345072287, -2.31797738168286, -1.39732876595068,
1.20957967195955, 0.986599398015508, -0.498111792090445), dim = c(100L,
1L)), msex = structure(c(-0.618392786628545, -0.618392786628545,
-0.618392786628545, 1.61319857381359, -0.618392786628545, -0.618392786628545,
1.61319857381359, -0.618392786628545, 1.61319857381359, -0.618392786628545,
1.61319857381359, 1.61319857381359, -0.618392786628545, -0.618392786628545,
-0.618392786628545, -0.618392786628545, 1.61319857381359, 1.61319857381359,
-0.618392786628545, -0.618392786628545, 1.61319857381359, -0.618392786628545,
-0.618392786628545, -0.618392786628545, -0.618392786628545, 1.61319857381359,
-0.618392786628545, -0.618392786628545, 1.61319857381359, -0.618392786628545,
-0.618392786628545, 1.61319857381359, -0.618392786628545, -0.618392786628545,
1.61319857381359, -0.618392786628545, -0.618392786628545, 1.61319857381359,
1.61319857381359, 1.61319857381359, -0.618392786628545, 1.61319857381359,
-0.618392786628545, -0.618392786628545, -0.618392786628545, 1.61319857381359,
-0.618392786628545, 1.61319857381359, -0.618392786628545, -0.618392786628545,
-0.618392786628545, 1.61319857381359, -0.618392786628545, -0.618392786628545,
-0.618392786628545, -0.618392786628545, 1.61319857381359, -0.618392786628545,
-0.618392786628545, 1.61319857381359, -0.618392786628545, 1.61319857381359,
1.61319857381359, -0.618392786628545, -0.618392786628545, -0.618392786628545,
-0.618392786628545, 1.61319857381359, -0.618392786628545, 1.61319857381359,
-0.618392786628545, 1.61319857381359, -0.618392786628545, 1.61319857381359,
-0.618392786628545, -0.618392786628545, -0.618392786628545, -0.618392786628545,
-0.618392786628545, -0.618392786628545, -0.618392786628545, -0.618392786628545,
-0.618392786628545, 1.61319857381359, -0.618392786628545, 1.61319857381359,
-0.618392786628545, -0.618392786628545, -0.618392786628545, -0.618392786628545,
-0.618392786628545, -0.618392786628545, 1.61319857381359, -0.618392786628545,
-0.618392786628545, -0.618392786628545, 1.61319857381359, -0.618392786628545,
-0.618392786628545, -0.618392786628545), dim = c(100L, 1L)),
educ = structure(c(0.379218270712223, 1.10278416655393, 0.379218270712223,
-1.0679135209712, 0.0174353227913668, -0.706130573050345,
2.1881330103165, -1.0679135209712, 1.10278416655393, 1.82635006239565,
1.10278416655393, 0.379218270712223, -0.344347625129489,
1.10278416655393, 1.10278416655393, -1.79147941681291, 2.91169890615821,
0.379218270712223, -1.42969646889206, -0.706130573050345,
0.741001218633079, 0.379218270712223, 0.379218270712223,
-1.0679135209712, -1.0679135209712, 0.379218270712223, -0.706130573050345,
1.10278416655393, 1.46456711447479, 0.741001218633079, -1.0679135209712,
-0.344347625129489, -0.344347625129489, -1.0679135209712,
0.741001218633079, 0.741001218633079, 0.379218270712223,
0.379218270712223, -0.344347625129489, -0.344347625129489,
-0.344347625129489, -1.0679135209712, 0.741001218633079,
0.379218270712223, 0.379218270712223, 1.10278416655393, -1.42969646889206,
0.379218270712223, 0.379218270712223, -1.0679135209712, 1.10278416655393,
0.379218270712223, -1.0679135209712, -2.51504531265462, 0.379218270712223,
1.82635006239565, 0.379218270712223, 1.46456711447479, -1.79147941681291,
0.741001218633079, 0.379218270712223, -1.0679135209712, 0.0174353227913668,
0.0174353227913668, -1.0679135209712, -1.0679135209712, 0.379218270712223,
0.379218270712223, -1.0679135209712, 2.1881330103165, 0.741001218633079,
-0.344347625129489, -0.344347625129489, 0.379218270712223,
-0.344347625129489, -1.0679135209712, 0.379218270712223,
1.10278416655393, 1.10278416655393, 1.10278416655393, -0.706130573050345,
-0.706130573050345, 0.379218270712223, -1.0679135209712,
-1.0679135209712, -1.0679135209712, -1.0679135209712, 0.379218270712223,
1.82635006239565, -1.0679135209712, 0.379218270712223, -1.0679135209712,
-2.15326236473377, -0.344347625129489, 1.46456711447479,
-0.706130573050345, 0.379218270712223, -0.706130573050345,
-0.706130573050345, -0.344347625129489), dim = c(100L, 1L
))), row.names = c(45L, 81L, 131L, 132L, 189L, 190L, 191L,
192L, 193L, 194L, 196L, 197L, 198L, 201L, 202L, 217L, 226L, 227L,
228L, 230L, 232L, 233L, 234L, 235L, 236L, 239L, 240L, 241L, 242L,
243L, 244L, 245L, 397L, 409L, 511L, 529L, 533L, 534L, 535L, 537L,
540L, 542L, 543L, 544L, 545L, 547L, 548L, 549L, 551L, 552L, 554L,
555L, 556L, 557L, 559L, 565L, 566L, 568L, 572L, 573L, 574L, 575L,
577L, 579L, 581L, 582L, 583L, 586L, 589L, 590L, 591L, 592L, 594L,
596L, 597L, 598L, 599L, 600L, 601L, 602L, 603L, 605L, 606L, 608L,
609L, 610L, 611L, 614L, 615L, 617L, 618L, 619L, 620L, 621L, 622L,
623L, 625L, 626L, 627L, 628L), class = "data.frame")
Upvotes: 1
Views: 335
Reputation: 17823
You could use the plot_cap()
function from the marginaleffects
package to do much of the heavy lifting for you. (Disclaimer: I am the maintainer.)
library(ggplot2)
library(marginaleffects)
# center
DF[] <- lapply(DF, scale)
DF[] <- lapply(DF, c)
# fit
fit1 <- lm(gene.exp ~ sleep.metric + age_death + msex + educ, data = DF)
# plot
plot_cap(fit1, condition = "sleep.metric") +
geom_point(data = DF, aes(sleep.metric, gene.exp)) +
labs(x = "Sleep metric", y = "Gene expression") +
theme_minimal()
Upvotes: 1
Reputation: 174468
Your code seems more complex than it needs to be. Firstly, since three of your columns are actually single-column matrices rather than vectors, let's convert them into vectors for simplicity:
DF[] <- lapply(DF, c)
Now we can create our model with a tidier and more idiomatic use of the lm
function, which is much easier if you use the data
argument:
fit1 <- lm(gene.exp ~ sleep.metric + age_death + msex + educ, data = DF)
Now we create a data frame to feed into predict
. This should have an equally-spaced vector of values for sleep.metric
but 0 for all other variables:
pred_df <- data.frame(
sleep.metric = seq(min(DF$sleep.metric), max(DF$sleep.metric), length = 100),
age_death = 0,
msex = 0,
educ = 0)
When we plug this into the newdata
argument of predict
, we will get a list that includes the predictions and their standard errors. We can use these to create an upper and lower confidence interval for our prediction line:
preds <- predict(fit1, newdata = pred_df, se.fit = TRUE)
pred_df$gene.exp <- preds$fit
pred_df$upper <- preds$fit + 1.96 * preds$se.fit
pred_df$lower <- preds$fit - 1.96 * preds$se.fit
Now we can plot, using geom_ribbon
for our confidence interval:
ggplot(DF, aes(sleep.metric, gene.exp)) +
geom_point(alpha = 0.5) +
geom_ribbon(data = pred_df, aes(ymax = upper, ymin = lower), alpha = 0.3,
fill = "lightblue") +
geom_line(data = pred_df, color = "darkblue", linetype = 2) +
theme_minimal(base_size = 16)
Note that you are effectively plotting the marginal effect of sleep.metric
against the mean of the other covariates. In your case this results in a very similar plot to a straightforward geom_smooth(method = lm)
of the two variables shown in your plot, so without any of the above code, you could just do:
ggplot(DF, aes(sleep.metric, gene.exp)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", formula = y ~ x, fill = "lightblue",
color = "darkblue", linetype = 2, alpha = 0.3, linewidth = 0.5) +
theme_minimal(base_size = 16)
Upvotes: 1