Reputation: 237
How to fit the multivariate fractional polynomial of the following form:
Given a function: y = f(x,z)
, a function of two variables x
and z
. More specifically it is of the form:
y = (x^2 + x^3)/(z^2 + z^3)
Numerator is a polynomial of a 3rd degree of a predictor x, and the denominator is also a polynomial of a 3rd degree of some predictor z.
I would like to fit polynomials for each of the predictor x
and z
, i.e. I need to find the coefficients A,B,C,D:
y = (A*x^2 + B*x^3)/(C*z^2 + D*z^3)
Basically, y
is a ratio of two polynomials of degree 3. How to fit such a function?
Sample of a data frame below. I cannot post full data frame as it has more than 1000 rows.
y = c(-4.10594369806545, -4.23691458506868, -4.24690667936422, -3.53677470254628, -4.30406509320417, -4.19442802077908, -4.66857169733859, -2.82271310942235, -4.19720194766181, 3.52164353473802, -4.3917019001973, -5.41654474791269, 2.87471821731616, -3.85922481986118, -4.25370811223789, -3.57887855889961, -5.33913936106829, -4.11775265312012, -2.89958841300109, -4.18661983833127)
x = c(8.06526520889773, 9.39897529082673,9.07348918922699,7.5522372875608, 9.17294998275762,5.77455154554441, 9.2005930205213, 8.07309119969315, 7.42177579364465,8.18896686364888, 8.07868822922987, 8.50956416425175,9.71269017726113, 7.98378106897745, 7.69893619981345, 8.49576524400262, 8.02224091680654,8.25400859056484, 7.58171964012531, 8.35655484545343)
z = c(2.56494935746154, 4.99043258677874, 4.43081679884331,3.66356164612965,4.53259949315326,1.79175946922805,4.23410650459726, 5.38449506278909,3.13549421592915,4.34380542185368, 3.43398720448515,2.77258872223978,6.94985645500077,3.97029191355212, 3.40119738166216,4.39444915467244,2.19722457733622,3.91202300542815,4.06044301054642, 3.87120101090789)
dat = data.frame(cbind(y=y,x=x,z=z))
UPDATE:
Call to nls
:
nls(y~(a*(x**2) + b*(x**3))/(c*(z**2) + d*(z**3)), dat, start=list(a=1,b=1,c=1,d=1))
Upvotes: 1
Views: 396
Reputation: 3364
This is a nice problem that you have here. At least I learned some things from it. However, I have a feeling that this question resloves more around the solution of this specific task (university assignment?) than to be a general question.
But let me share an approach to the solution: What we have here
can be simplyfied to
Solving to y^\theta becomes numerically more managable.
As we may see (and after trying very hard and failing to solve a non linear problem) it is actually a division of two linear models. So one approach is to estimate coefficients of the two problems separately. That is we fix coefficients a
and b
to find c
and d
and afterwards use c
and d
to find a
and b
.
Following code solves for coefficients
First some data
library(dplyr)
sampleData <- data.frame(x = runif(100, -100, 100), z = runif(100, -100, 100)) %>%
mutate(y = ( (-2 * x^2) + (5 * x^3) ) / (-4 * z^2 + 6 * z^3)) %>%
mutate(zxfactor = z^2/x^2,
yy = y * zxfactor)
now we solve for yy
. With some random starting values...
init2 <- structure(runif(4, -10, 10), names=c("A", "B", "C", "D"))
coefab <- init2[c("A", "B")]
coefcd <- init2[c("C", "D")]
... we need to fit a linear model for a
and b
by
and a linear model for c
and d
by
# don't use for loop but determine a terminal condition... but i'm too lazy :-)
for(i in 1:100) {
# make linear prediction using coeff. c and d
sampleData <- sampleData %>%
mutate(yab = yy * (coefcd[1] + coefcd[2] * z))
# and fit a model for a and b
coefab <- coef(lm(yab ~ x, sampleData))
# then make a linear prediction using coeff. a and b
sampleData <- sampleData %>%
mutate(ycd = (coefab[1] + coefab[2] * x) / yy)
# and fit a model for c and d
coefcd <- coef(lm(ycd ~ z, sampleData))
} # repeat until satisfied
coefab
coefcd
Are we happy with the coefficients we have found ? Lets check:
optimFun <- function(params, out, x, z) {
res <- (params[1] + params[2]*x)/(params[3] + params[4]*z)
return( sqrt(sum( (out - res)^2 )) )
}
optimFun(c(coefab, coefcd), x = sampleData$x, z = sampleData$z, out = sampleData$yy)
> optimFun(c(coefab, coefcd), x = sampleData$x, z = sampleData$z, out = sampleData$yy)
[1] 1.951043e-12
Indeed we are since the difference between the modelled function estimates and data yy
(the scaled one) is close to zero. Different iterations result in different parameter estimates since the problem is overdetermined. (Maybe someone can explain it in more detail)
Comments:
nls
optim
you can use the optimFun
. And for faster convergence you may even define derivate functionlm(yyz ~ x, data = sampleData %>% mutate(yyz = yy*(-4 + 6*z)))
which will return exact values for a = -2
and b = 5
. Given any two parameters you can find a matching pair that minimizes the function.Upvotes: 2