Reputation: 51
I am trying to predict the presence of a spieces sighted during different moon phases using glm() and family=binomial, incl a couple of other environmental variables such as seastate, wind direction and survey duration. The best model (lowest AIC) is:
> GLMoon <- glm(Gg~Phase*Duration+WindDir+Seastate, data=Pres, family=binomial)
> anova(GLMoon, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: Gg
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 5783 7917.1
Phase 7 16.306 5776 7900.8 0.02246 *
Duration 1 58.762 5775 7842.0 1.779e-14 ***
WindDir 8 44.156 5767 7797.9 5.315e-07 ***
Seastate 3 103.786 5764 7694.1 < 2.2e-16 ***
Phase:Duration 7 18.191 5757 7675.9 0.01114 *
Since the interaction is significant I would like to plot this interaction as follows (based on a previous question, but on abundance data, see: Manual interaction plot linear regression in R)
> #range of survey duration 15 - 45 min
> dur2 <- seq(from = 15, to = 45, length.out=100)
>
> # use predict to get predicted values and expand.grid() to generate a data frame of all combinations of your dur2 sequence
>
> prediction_data = expand.grid(
+ Duration = dur2,
+ Phase = unique(Pres$Phase),
+ WindDir = unique(Pres$WindDir),
+ Seastate = unique(Pres$Seastate)
+ )
>
> # column names in prediction_data need to match those in the model formula
> prediction_data$pred = predict(GLMoon, newdata = prediction_data)
> prediction_data$predorg = exp(prediction_data$pred)
>
> prediction_data
Duration Phase WindDir Seastate pred predorg
1 15.00000 5 S 1 -0.203011901 0.8162685
2 16.03448 5 S 1 -0.176720796 0.8380137
3 17.06897 5 S 1 -0.150429692 0.8603382
4 18.10345 5 S 1 -0.124138587 0.8832574
5 19.13793 5 S 1 -0.097847482 0.9067872
6 20.17241 5 S 1 -0.071556378 0.9309438
7 21.20690 5 S 1 -0.045265273 0.9557439
8 22.24138 5 S 1 -0.018974169 0.9812047
9 23.27586 5 S 1 0.007316936 1.0073438
10 24.31034 5 S 1 0.033608041 1.0341792
11 25.34483 5 S 1 0.059899145 1.0617295
12 26.37931 5 S 1 0.086190250 1.0900137
13 27.41379 5 S 1 0.112481355 1.1190514
...
[ reached 'max' ...
This has worked with the abundance data (lm()), but now due to categorial factors included in the model I get multiple duration values that are the same for each factor basically, see:
I cannot average those factors when using the expand.grid()
to calculate predicted values, so I am not sure how I can fix this or if this is even possible at all?
The problem is that when plotting this the predicted value points get connected vertically due to multiple values of the same duration (but different seastate/wind direction).
> library(viridis)
> plot.interact1 <- prediction_data %>%
+ ggplot(aes(x=Duration, y=predorg, group=Phase, colour=Phase)) +
+ geom_line(size=0.5) +
+ guides(color=guide_legend(title="Lunar Phase")) +
+ ylab("Predicted mean presence") +
+ xlab("Survey duration") +
+ labs(fill="Lunar Phases")
>
> plot.interact1
Upvotes: 0
Views: 50
Reputation: 51
Worked it out -- needed to aggregate before back transforming the data like:
> prediction_data = expand.grid(
+ Duration = dur2,
+ Phase = unique(Pres$Phase),
+ WindDir = unique(Pres$WindDir),
+ Seastate = unique(Pres$Seastate)
+ )
> # column names in prediction_data need to match those in the model formula
> prediction_data$pred = predict(GLMoon, newdata = prediction_data)
> prediction_data
Duration Phase WindDir Seastate pred
1 15.00000 5 S 1 -0.203011901
2 15.30303 5 S 1 -0.195310466
3 15.60606 5 S 1 -0.187609032
4 15.90909 5 S 1 -0.179907597
5 16.21212 5 S 1 -0.172206162
6 16.51515 5 S 1 -0.164504728
7 16.81818 5 S 1 -0.156803293
8 17.12121 5 S 1 -0.149101858
9 17.42424 5 S 1 -0.141400423
10 17.72727 5 S 1 -0.133698989
...
[ reached 'max' / getOption("max.print") -- omitted 31800 rows ]
> # aggregate before back transform -- average over sea state and wind direction
> mean.pred <- aggregate(list(predmean=prediction_data$pred), by=list(Duration=prediction_data$Duration, Phase2.2=prediction_data$Phase2.2), mean, na.rm=TRUE)
> #back transform predicted means
> mean.pred$predmeanorg <- exp(mean.pred$predmean)
> mean.pred
Duration Phase predmean predmeanorg
1 15.00000 1 -0.5281409628 0.5897002
2 15.30303 1 -0.5136241636 0.5983232
3 15.60606 1 -0.4991073644 0.6070723
4 15.90909 1 -0.4845905652 0.6159493
5 16.21212 1 -0.4700737660 0.6249562
6 16.51515 1 -0.4555569668 0.6340947
7 16.81818 1 -0.4410401676 0.6433669
8 17.12121 1 -0.4265233684 0.6527746
9 17.42424 1 -0.4120065692 0.6623199
10 17.72727 1 -0.3974897700 0.6720048
...
[ reached 'max' / getOption("max.print") -- omitted 550 rows ]
Upvotes: 0