Markus
Markus

Reputation: 3782

How to get coefficients of multinomial logistic regression?

I need to calculate coefficients of a multiple logistic regression using sklearn:

X =

x1          x2          x3   x4         x5    x6
0.300000    0.100000    0.0  0.0000     0.5   0.0
0.000000    0.006000    0.0  0.0000     0.2   0.0
0.010000    0.678000    0.0  0.0000     2.0   0.0
0.000000    0.333000    1.0  12.3966    0.1   4.0
0.200000    0.005000    1.0  0.4050     1.0   0.0
0.000000    0.340000    1.0  15.7025    0.5   0.0
0.000000    0.440000    1.0  8.2645     0.0   4.0
0.500000    0.055000    1.0  18.1818    0.0   4.0

The values of y are categorical in range [1; 4].

y =

1
2
1
3
4
1
2
3

This is what I do:

import pandas as pd
from sklearn import linear_modelion
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

h = .02

logreg = linear_model.LogisticRegression(C=1e5)

logreg.fit(X, y)

# print the coefficients
print(logreg.intercept_)
print(logreg.coef_)

However, I get 6 columns in the output of logreg.intercept_ and 6 columns in the output of logreg.coef_ How can I get 1 coefficient per feature, e.g. a - f values?

y = a*x1 + b*x2 + c*x3 + d*x4 + e*x5 + f*x6

Also, probably I am doing something wrong, because y_pred = logreg.predict(X) gives me the value of 1 for all rows.

Upvotes: 1

Views: 5159

Answers (2)

beenjaminnn
beenjaminnn

Reputation: 794

In 2025, this is not possible with sklearn (I spent a while looking for a solution)

This is a valid task in multi-class logistic regression. You are asking to find a single set of coefficients that simultaneously explain all of the classes. Sklearn (and other packages) find a set of coefficients for each class, which is why it returns a 6x4 matrix - you have 6 features and 4 targets.

The PyLogit package can fit your model. There is an example of performing logistic regression on a dataset with 4 input features and 4 targets - see Specify and Estimate a Multinomial Logit (MNL) Model.

Alternatively, this is a PyTorch implementation I made of the same model:

class MultinomialLogit(nn.Module):
    def __init__(self, n_feats, n_alts, ref_alt=0):
        super().__init__()
        self.n_feats = n_feats  # number of features
        self.n_alts = n_alts  # number of alternatives
        self.ref_alt = ref_alt  # reference alternative

        self.coeffs = nn.Linear(self.n_feats, 1, bias=False)
        self.biases = nn.Parameter(torch.zeros(1, self.n_alts - 1))
        self.bias_mask = torch.ones(self.n_alts, dtype=bool)
        self.bias_mask[self.ref_alt] = False

    # expects a batch of features batch_size x n_alts x n_feats
    def forward(self, feats):
        # a bigger batch of features where each alternative is a sample, i.e. (batch_size * n_alts) x n_feats
        feats_flat = feats.flatten(0, 1)
        util_flat = self.coeffs(feats_flat)  # utils (batch_size * n_alts) x 1
        util = util_flat.reshape(-1, self.n_alts)  # batch size x n alts
        util[:, self.bias_mask] += self.biases  # bias (ASC) is a constant relative to the reference alternative
        return util

    def train_step(self, feats, labels):
        utils = self.forward(feats)
        loss = nn.functional.cross_entropy(utils, labels, reduction="sum")
        return loss

    def get_params(self):
        params = dict(self.named_parameters())
        return {"asc": params["biases"].detach(), "beta": params["coeffs.weight"].detach()}

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

Check the online documentation:

coef_ : array, shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

As @Xochipilli has already mentioned in comments you are going to have (n_classes, n_features) or in your case (4,6) coefficients and 4 intercepts (one for each class)

Probably I am doing something wrong, because y_pred = logreg.predict(X) gives me the value of 1 for all rows.

yes, you shouldn't try to use data that you've used for training your model for prediction. Split your data into training and test data sets, train your model using train data set and check it's accuracy using test data set.

Upvotes: 3

Related Questions