Reputation: 51
I am trying to predict the mean abundance of animals sighted during different moon phases (factor), using log-transformed abundance data (better fit) and some other variables. The best model (lowest AIC) turned out to include the interaction of phase and survey duration and the cloud cover (both continuous):
LMoon<-lm(ln~Phase*Duration+Clouds, data=abund)
summary(LMoon)
Call:
lm(formula = ln ~ Phase * Duration + Clouds, data = abund)
Residuals:
Min 1Q Median 3Q Max
-1.75416 -0.46311 0.09522 0.46591 1.85978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.382031 0.876865 0.436 0.664125
Phase2 2.130065 1.226305 1.737 0.085851 .
Phase3 1.971060 1.818542 1.084 0.281351
Phase4 0.608043 1.140122 0.533 0.595146
Phase5 4.786674 1.151850 4.156 7.44e-05 ***
Phase6 0.958706 1.046831 0.916 0.362238
Phase7 0.254711 3.425214 0.074 0.940888
Phase8 0.865995 1.043916 0.830 0.409005
Duration 0.069153 0.035407 1.953 0.053952 .
Clouds -0.004259 0.002401 -1.774 0.079494 .
Phase2:Duration -0.087843 0.047818 -1.837 0.069545 .
Phase3:Duration -0.089908 0.069652 -1.291 0.200109
Phase4:Duration -0.005424 0.046675 -0.116 0.907749
Phase5:Duration -0.172016 0.049369 -3.484 0.000768 ***
Phase6:Duration -0.035597 0.041435 -0.859 0.392583
Phase7:Duration 0.024084 0.176773 0.136 0.891939
Phase8:Duration -0.033424 0.042064 -0.795 0.428963
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7721 on 89 degrees of freedom
Multiple R-squared: 0.3368, Adjusted R-squared: 0.2176
F-statistic: 2.825 on 16 and 89 DF, p-value: 0.0009894
Now, due to this interaction, I need to make an interaction plot (CI are too wide when plotting the lsmeans). I've tried to use different functions mentioned out there but none of them worked. Apparently I need to calculate and plot manually which I did like this:
intercepts <- c(coef(LMoon)["(Intercept)"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase2"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase3"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase4"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase5"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase6"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase7"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase8"])
lines.df <- data.frame(intercepts = intercepts,
slopes = c(coef(LMoon)["Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase2:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase3:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase4:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase5:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase6:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase7:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase8:Duration"]),
Phase2 = levels(abund$Phase))
qplot(x = Duration, y = Sp2, color = Phase, data = abund) +
geom_abline(aes(intercept = intercepts,
slope = slopes,
color = Phase), data = lines.df)
The plot that I get is wrong as the y-values are on the original, true scale, but the lines are based on the lm which uses the log-transformed data.
interaction plot abundance, duration, lunar phases
To back-transform this someone told me that I won't get straight lines in the end, actually. Rather than using abline(), I should create a set of e.g. 100 new x values that cover the range of the duration data and use the coefficients to calculate your predicted y values. Then plot these using lines() and it should look like a smooth curve.
And this is where I get lost.
So I created a the set of new x-values for the range of the survey duration (min 15 max 45 min):
dur2 <- seq(from = 15, to = 45, length.out=100)
Then once I got those values I am supposed to get the predicted y value for each x-value using the coefficients of my LM. After that, back-transforming the y-values to the original scale. And then using the x-values and the back-transformed y-values to add the lines to the plot.
How do I get the predicted values now exactly? I cannot use any pred type/function, I've tried it all. It just doesn't work with my model, so manual is the only way, but I have no clue how...
Hope anyone can help me with this, I've been trying for weeks and in despair by now, close to giving up.
Cheers!
PS Here the data:
> dput(subset(abund, Phase %in% c("Phase1", "Phase2")))
structure(list(Year = integer(0), Date = structure(integer(0), .Label = c("01/08/2009",
"01/08/2016", "02/07/2019", "02/08/2009", "02/08/2012", "02/08/2016",
"02/09/2007", "03/08/2007", "03/08/2009", "03/08/2014", "03/08/2015",
"04/07/2019", "04/08/2009", "04/08/2013", "05/08/2009", "05/08/2014",
"05/08/2015", "06/07/2008", "06/07/2019", "07/08/2009", "08/07/2010",
"09/07/2010", "09/08/2015", "10/08/2009", "11/08/2009", "12/08/2009",
"13/08/2009", "13/08/2014", "14/08/2009", "14/08/2012", "16/07/2006",
"18/07/2009", "18/08/2015", "19/07/2011", "20/08/2009", "21/07/2011",
"21/09/2009", "22/07/2011", "22/07/2016", "22/07/2017", "23/07/2007",
"23/07/2016", "23/07/2017", "24/07/2017", "25/07/2007", "25/07/2010",
"25/07/2017", "25/08/2016", "26/07/2010", "26/07/2011", "27/07/2006",
"27/07/2011", "27/07/2012", "28/07/2016", "29/06/2019", "29/07/2005",
"29/07/2009", "29/07/2010", "29/07/2016", "29/07/2019", "30/07/2005",
"30/07/2007", "30/07/2016", "30/08/2005", "31/07/2005", "31/07/2009",
"31/07/2014", "31/07/2016"), class = "factor"), NrSurvey = integer(0),
Duration = integer(0), Sp2 = integer(0), Phase = structure(integer(0), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), Clouds = integer(0),
Visibility = integer(0), ln = numeric(0)), row.names = integer(0), class = "data.frame")
Upvotes: 1
Views: 428
Reputation: 146090
Use predict
to get predicted values. Don't calculate manually.
Use expand.grid()
to generate a data frame of all combinations of your dur2
sequence and the other predictors at the value(s) you want plot. Something like this:
prediction_data = expand.grid(
Duration = dur2,
Phase= unique(abund$Phase),
Clouds = mean(abund$Clouds) # Hold Clouds constant at some value
)
# column names in prediction_data need to match those in the model formula
prediction_data$pred = predict(LMoon, newdata = prediction_data)
prediction_data$pred_orig = exp(prediction_data$pred)
# plot
ggplot(prediction_data, aes(x = Duration, y = pred_orig, color = Phase)) +
geom_line() +
geom_point(data = abund)
Something like that should work.
Another nice option is to use broom::augment
to generate the predictions. This can easily give standard errors and residuals for each prediction point as well.
library(broom)
prediction_data = augment(LMoon, newdata = prediction_data)
Upvotes: 2