Reputation: 31
I am trying to implement one of the R codes in python from scratch and it involves logistic regression.
As far as I understand logistic regression, (while performing one vs all using gradient descent) I think if there are F features and L labels then the we have M x F coefficients. Basically We have F different vectors for each of M labels and then calculate the sigmoid function for an incoming input X and whichever Vector gives maximum is the class predicted.
The logistic regression function in R:
try_lrm<-function(datadf, tol=1e-10, maxit=1e6){
try({ lrm(y~x, data=datadf, penalty=0, x=TRUE, y=TRUE, tol=tol, maxit=maxit) })
}
However on the ordinal regression for the following data-frame:
x y
24.03673 2
14.63598 2
26.85079 2
53.45076 1
36.8322 1
42.10773 1
39.68833 1
104.64827 0
114.97038 0
60.8128 0
59.67947 0
I get the following coefficients:
y>=1 y>=2 x
131.440196 75.784904 -2.324528
As I am trying to implement everything from scratch, I am trying to use gradient descent.
So how should this be interpreted ? I want to figure out how the sigmoid function should look like but I am not sure why there is just one coefficient for x when I am expecting there to be one x coefficient for every class. And what are those intercepts.
Does it mean that the sigmoid function looks like:
(Lets call the coefficients as k0,k1,k2 which I got for x,y>=1 and y>=2)
for y = 0,
p = 1/(1+e^-(k0 * x))
for y = 1,
p = 1/(1+e^-(k0 * x + k1))
for y = 2,
p = 1/(1+e^-(k0 * x + k1 + k2))
And predict max p class ?
Upvotes: 0
Views: 381
Reputation: 576
This appears to primarily be a statistics question - there isn't anything obviously wrong with your R code. You should only have one coefficient for x, as is reported. For an example of ordinal logistic regression in R, see https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/. It uses a different package than you are trying to use, but it walks through the statistics as well as R code.
Upvotes: 1