Reputation: 5211
I have fit a GAM-GEE in the {geepack}
package as I wanted to account for within individual residual autocorrelation for some spatial animal data with known individuals. I have a cyclic covariate I want to fit using a cyclic spline and so I did this by generating the basis functions for the spline first in {mgcv}
, as cyclic cubic splines are not currently available in the {splines}
package, and then using this as a covariate in the geeglm()
fit. I did this by first:
cyclic_basisfunc <- mgcv::gam(response ~s(cyclic_term, bs="cc"), fit=F, data=df, family = "binomial")$X %>%
as_tibble() %>%
dplyr::select(V2:ncol(.)) %>%
as.matrix()
and then:
fit1 <- geepack::geeglm(response ~ cyclic_basisfunc +
bs(beta1)+
bs(lon+lat),
family="binomial",
data=df,
id=as.factor(id),
corstr = "independence")
I can estimate the partial effects of the cyclic smooth by extracting the relevant elements of the model matrix which correspond to that term (in this case 8 basis functions for the cyclic term). However when it comes to predicting over a spatial grid, using the predict() function, I cannot figure out how to appropriately name the elements in the prediction grid. The input variable for the cyclic term was a matrix and any attempt to include the prediction values either as a numeric vector or a matrix returns faults.
I tried generating a prediction grid over my study area as such (n.b. have tried using both the name of the cyclic_basisfunc and cyclic_term as column headers):
predgrid <- data.frame(expand.grid(lat=seq(min(df$lat-0.1), max(df$lat+0.1), length.out = 20),
lon=seq(min(df$lon-0.1), max(df$lon+0.1), length.out = 20),
cyclic_basis_func=seq(min(cyclic_basis_func),
max(cyclic_basisfunc), length.out=20),
beta1=25)))
predict(fit1, predgrid)
and get the error:
Error: variable 'cyclic_basisfunc' was fitted with type "nmatrix.8" but type "numeric" was supplied
This makes sense as the original covariate was fit as an 8 column matrix, 1 for each basis function. Have tried inputting a column into the prediction gird as an 8 column matrix yet still this does not seem to work. Is there a way to generate an appropriate prediction grid for this model? (dput for sample of data below).
df <- structure(list(response = c(0.117, 0.915, 0.22, 0.284, 0.551,
0.871, 0.25, 0.261, 0.117, 0.875, 0.67, 0.533, 0.324, 0.138,
0.154, 0.286, 0.935, 0.744, 0.118, 0.224, 0.865, 0.13, 0.889,
0.115, 0, 0, 0.703, 0.917, 0.812, 0.14, 0.219, 0.114, 0.24, 0.163,
0.115, 0.228, 0, 0, 0.115, 0.106, 0.243, 0.13, 0.908, 0.117,
0.472, 0.95, 0.217, 0.133, 0.744, 0.92, 0.26, 0.138, 0.958, 0.113,
0.147, 0.132, 0.496, 0.148, 0.119, 0.186, 0.721, 0.889, 0.162,
0.157, 0.269, 0.25, 0.129, 0.357, 0.188, 0.361, 0.137, 0.128,
0.872, 0.121, 0.135, 0.466, 0.15, 0.589, 0.138, 0.134, 0.122,
0.463, 0.121, 0.369, 0.813, 0.145, 0.911, 0.17, 0.123, 0.649,
0.476, 0.119, 0.367, 0.135, 0.923, 0.875, 0.119, 0.115, 0.895,
0.923), cyclic_term = c(0.732222222222222, -2.28277777777778,
2.56777777777778, -3.43333333333333, -0.89, -5.68611111111111,
-3.47388888888889, -2.88277777777778, -1.79277777777778, 1.50333333333333,
-0.910555555555556, -4.14944444444444, -0.379027777777778, 0.113333333333333,
0.075, -3.99055555555556, -5.48388888888889, -1.84513888888889,
0.286111111111111, -1.24833333333333, -2.19652777777778, -0.532222222222222,
0.598333333333333, 5.43222222222222, -2.73222222222222, -1.11125,
3.16833333333333, -2.88055555555556, 1.90319444444444, 2.585,
-5.64333333333333, -3.79666666666667, 0.692083333333333, 5.80666666666667,
-4.42333333333333, 1.95666666666667, 2.61722222222222, -5.90055555555556,
1.63, 3.55222222222222, -4.20111111111111, -2.34, 3.39277777777778,
-4.32944444444444, 1.32222222222222, 4.74388888888889, 0.251111111111111,
0.0705555555555556, -1.84513888888889, 5.14305555555556, -3.92555555555556,
-2.34277777777778, 2.55777777777778, -3.75944444444444, 2.32277777777778,
1.82944444444444, -3.42111111111111, 3.22388888888889, -3.90055555555556,
1.94, -5.01888888888889, 4.93902777777778, -2.97388888888889,
4.11222222222222, 1.55055555555556, -2.29055555555556, 0.556666666666667,
1.40375, 3.52944444444444, 4.56555555555556, 1.30611111111111,
-2.59944444444444, 4.11166666666667, 6.005, 1.28111111111111,
-2.35333333333333, -0.125, 0.991666666666667, -4.29055555555556,
4.64055555555556, 1.19222222222222, -0.710555555555556, 0.125,
3.835, -3.79722222222222, 1.46, 0.455833333333333, -5.855, 2.59277777777778,
-1.42777777777778, 4.815, 0.508888888888889, -2.14333333333333,
-1.47444444444444, 5.01847222222222, 3.06666666666667, 5.92388888888889,
1.39944444444444, 5.00236111111111, 4.21666666666667), beta1 = c(32.95,
28.8, 32.15, 32.75, 34.7, 29.7, 28.95, 28.85, 32.25, 28.5, 33.5,
28.5, 30.75, 28.8, 32.4, 32.65, 29.7, 32.25, 29.7, 31.85, 32.15,
31.45, 31.25, 31.05, 38.7, 35.2, 31.65, 33, 32.05, 28.7, 31.85,
35.5, 31.25, 35.25, 33.25, 29.7, 35.5, 30.55, 35.45, 36, 33,
29.85, 31.15, 33.35, 34.8, 28.1, 35.35, 28.8, 32.25, 29.3, 29.7,
28.95, 28.4, 34.7, 28.5, 28.8, 28.5, 33.1, 36.35, 29, 26.95,
33.05, 32.2, 29.2, 30.35, 36, 29.7, 34.25, 34.1, 31.9, 32.05,
28.9, 33.3, 31.85, 32.55, 28.8, 29.1, 39.2, 28.95, 32.15, 28.8,
33.1, 29.5, 37.95, 32.85, 28.5, 30.3, 34.55, 28.15, 33.35, 35.35,
31.6, 35.95, 28.9, 31.1, 32.5, 35.7, 31.85, 32.95, 33.55), lon = c(-8.14386899769604,
-8.1572279376935, -8.15157242384156, -8.11266145447895, -8.15174118001044,
-8.15335952072546, -8.14600667978252, -8.15297882985764, -8.15568356665537,
-8.13472008705139, -8.09368161533273, -8.10491923518749, -8.1603014305949,
-8.15632063618518, -8.13543502792374, -8.09733172904193, -8.15868642071182,
-8.12876868592058, -8.15690393084368, -8.10847025857788, -8.15564957894737,
-8.16047965739412, -8.1538774272955, -8.13959002494812, -8.16031308174808,
-8.13153629064039, -8.10225327088153, -8.11704735322503, -8.10320579591837,
-8.09718723480212, -8.14769670963066, -8.15279598640478, -8.1536752924518,
-8.15005347845117, -8.11959004402161, -8.15327362169542, -8.15338984397156,
-8.15480377425017, -8.14843624352758, -8.1536150198704, -8.14265275265451,
-8.15419676931013, -8.14388546959556, -8.12423783110794, -8.15865186565657,
-8.12779523300791, -8.15498210353148, -8.15711005849511, -8.12876868592058,
-8.14498268712871, -8.12777905464012, -8.14658887045715, -8.14966988563538,
-8.15137416124342, -8.14757286777317, -8.15659830243711, -8.11739216850327,
-8.14816670318125, -8.15283383471808, -8.15503278645551, -8.12968355281506,
-8.11532096238244, -8.15388445232111, -8.12550097443724, -8.14214966153336,
-8.15406262640947, -8.15366204068896, -8.16073804747475, -8.14748077293754,
-8.10236112317726, -8.13306401111526, -8.15754008293152, -8.11496173845736,
-8.14744857712061, -8.13097980901942, -8.15712565747807, -8.16003438931358,
-8.16002796870351, -8.11892837135011, -8.13700008392334, -8.15721941772399,
-8.14819490909576, -8.1561399618782, -8.16012501716613, -8.11709369783742,
-8.13470092070733, -8.14629675, -8.15713679162397, -8.13686372838595,
-8.12430530737705, -8.15464372671311, -8.15669989585876, -8.16044796649062,
-8.15766701538992, -8.09362511111111, -8.13870000839233, -8.14998125100998,
-8.15243885077905, -8.12291705479452, -8.15384632745479), lat = c(33.6622974395803,
33.6600173368609, 33.6599819460086, 33.6598656189251, 33.6673908233704,
33.6593042234088, 33.6572338143965, 33.6580708565473, 33.6629485478539,
33.6547317504883, 33.6598712810572, 33.6567040290043, 33.6652851274788,
33.6596383685524, 33.6570077561196, 33.6611995549193, 33.661593106719,
33.6588793953069, 33.6614662478323, 33.6584457075095, 33.6641300278638,
33.6621415089752, 33.6598426484043, 33.6570205688477, 33.6727939605693,
33.6593126847291, 33.6591196082918, 33.6591313969, 33.6605346,
33.6553558936485, 33.6650662271625, 33.6653484273653, 33.6600826614748,
33.6642524699626, 33.6585006713867, 33.658733988916, 33.664975062454,
33.6610514512704, 33.6621421965042, 33.6687249002733, 33.6576841575676,
33.6600360803426, 33.6574947407709, 33.6584060573279, 33.6673802707071,
33.6550894507841, 33.6654066532252, 33.6595139325288, 33.6588793953069,
33.6568044356436, 33.6559013620991, 33.6568076288124, 33.6572189331055,
33.6670448613725, 33.6563744930416, 33.6598780530983, 33.6554991693199,
33.6602992307323, 33.6667773110577, 33.6591215807971, 33.6549378741848,
33.6592876394984, 33.6624833258972, 33.6556429238423, 33.6564236265265,
33.6680011078708, 33.6591718634038, 33.6684195434343, 33.6642363505652,
33.6599223753418, 33.6575357831156, 33.6607284545898, 33.6602992227631,
33.6646009116676, 33.6569601709497, 33.6597380618501, 33.6610439757783,
33.6709440919706, 33.6553216040898, 33.6567993164063, 33.6604019538554,
33.6604042053223, 33.6607238769535, 33.6704235076904, 33.6597095889516,
33.6546409606935, 33.658124, 33.6659726122235, 33.6550456763148,
33.6591343852459, 33.6648648385784, 33.6633987426758, 33.6695201510491,
33.6609040792869, 33.655895, 33.6606597900391, 33.6629373624763,
33.6612642187822, 33.6580512191781, 33.6635307311956), id = c(8,
14, 12, 12, 7, 12, 8, 12, 10, 12, 14, 12, 6, 14, 10, 12, 8, 4,
14, 14, 2, 10, 2, 12, 9, 5, 12, 12, 5, 10, 12, 14, 2, 14, 14,
14, 12, 12, 14, 8, 12, 10, 12, 14, 6, 12, 12, 14, 4, 3, 12, 10,
14, 8, 14, 14, 12, 10, 8, 14, 12, 3, 10, 12, 12, 12, 14, 6, 14,
9, 12, 10, 12, 14, 14, 14, 14, 10, 12, 12, 14, 12, 14, 14, 12,
12, 4, 8, 14, 2, 14, 14, 11, 10, 3, 12, 8, 14, 3, 12)), row.names = c("25101",
"15358", "80097", "89024", "27479", "98805", "25425", "86335",
"31333", "93483", "12171", "100849", "155418", "12853", "33858",
"100851", "22470", "149448", "7443", "12167", "144934", "33938",
"144132", "91062", "56909", "153781", "95039", "99533", "153760",
"32687", "86913", "12298", "144133", "7402", "11672", "13939",
"78548", "98801", "8135", "24867", "91818", "32609", "95688",
"11675", "155218", "94268", "90367", "7440", "149447", "145506",
"90571", "35105", "14210", "26177", "14975", "16723", "86359",
"34450", "26139", "14198", "89237", "145503", "31062", "92665",
"87694", "78666", "13917", "155219", "15350", "60377", "82820",
"33174", "87056", "7406", "15370", "15356", "9330", "33533",
"86726", "95709", "8131", "96538", "13911", "14229", "86539",
"93482", "145837", "22101", "11305", "144939", "7391", "7445",
"21817", "32804", "145314", "98223", "24175", "8132", "145504",
"87396"), class = "data.frame")
Upvotes: 0
Views: 257
Reputation: 26
Have you tried evaluating the new values using the same basis used in your original data, then use the result as covariates? Note that you will get several warnings because your response is a proportion/probability and you have not given weights or anything like that. Assuming that the model is correctly specified, see if the code below does what you want.
It seems that you can extract the splines from your GAM using the "lpmatrix" type in predict.gam. See here: How to extract fitted splines from a GAM (`mgcv::gam`) I get a bit lost to be honest, so I would suggest just setting up the basis for your covariate without fitting a GAM.
Make sure the number and location of knots makes sense and set up a basis. As I understand it, cSplinesDes() evaluates the basis functions at the desired points
spl_bs <- cSplineDes(x = df$cyclic_term,
knots = seq(min(df$cyclic_term),
max(df$cyclic_term),
length.out = 8),
ord = 4, derivs = 0)
Give reasonable names
colnames(spl_bs) <- paste0("cyclic.", 1:ncol(spl_bs))
You probably want to include these new columns in your data frame. Note that I removed the Intercept term below but you might want to leave it in
df <- cbind(spl_bs, df)
Fit GEE with convenience code to specify model
model <- reformulate(c(-1,
paste0("cyclic.", 1:(ncol(spl_bs))),
"splines::bs(beta1)", "splines::bs(lon+lat)"),
response = "response")
fit1 <- geepack::geeglm(model,
family="binomial",
data=df,
id=as.factor(id),
corstr = "independence")
Now with the predictions. Create new data
predgrid <- data.frame(expand.grid(lat=seq(min(df$lat-0.1), max(df$lat+0.1), length.out = 20),
lon=seq(min(df$lon-0.1), max(df$lon+0.1), length.out = 20),
cyclic_term=seq(min(df$cyclic_term),
max(df$cyclic_term), length.out=20),
beta1=25))
Set the same basis you used for the original data, but now evaluate at the new points
new_spline <- cSplineDes(predgrid$cyclic_term,
knots = seq(min(df$cyclic_term),
max(df$cyclic_term),
length.out = 8),
ord = 4, derivs=0)
colnames(new_spline) <- paste0("cyclic.", 1:ncol(new_spline))
Assemble new data and predict from fit1
newdf <- cbind(new_spline, predgrid)
predict(fit1, newdf)
Note that latitude/longitude spans out of range of original data, which is dangerous
Upvotes: 1