P A N
P A N

Reputation: 5922

How to use GridSearchCV with MultiOutputClassifier(MLPClassifier) Pipeline

I am trying out scikit-learn for the first time, for a Multi-Output Multi-Class text classification problem. I am attempting to use GridSearchCV to optimize the parameters of MLPClassifier for this purpose.

I will admit that I am shooting in the dark here, having no prior experience. Please let me know if this makes sense.

Below is what I currently have:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.multioutput import MultiOutputClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

df = pd.read_csv('data.csv')

df.fillna('', inplace=True) #Replaces NaNs with "" in the DataFrame (which would be considered a viable choice in this multi-classification model)

x_features = df['input_text']
y_labels = df[['output_text_label_1', 'output_text_label_2']]

x_train, x_test, y_train, y_test = train_test_split(x_features, y_labels, test_size=0.3, random_state=7)

pipe = Pipeline(steps=[('cv', CountVectorizer()),
                       ('mlpc', MultiOutputClassifier(MLPClassifier()))])

pipe.fit(x_train, y_train)

pipe.score(x_test, y_test)

pipe.score gives a score of ~0.837, which seems to suggest that the above code is doing something. Running pipe.predict() on some test strings seems to yield relatively adequate output results.

However, even after looking at plenty examples, I don't understand how to implement GridSearchCV for this Pipeline. (Additionally, I would like advice on which parameters to search).

I doubt it makes sense to post my attempts with GridSearchCV since they have been varied and all unsuccessful. But a brief example from a Stack Overflow answer could be:

grid = [
        {
        'activation' : ['identity', 'logistic', 'tanh', 'relu'],
        'solver' : ['lbfgs', 'sgd', 'adam'],
        'hidden_layer_sizes': [(100,),(200,)]
        }
       ]

grid_search = GridSearchCV(pipe, grid, scoring='accuracy', n_jobs=-1)

grid_search.fit(x_train, y_train)

This gives the error:

ValueError: Invalid parameter activation for estimator Pipeline(steps=[('cv', CountVectorizer()), ('mlpc', MultiOutputClassifier(estimator=MLPClassifier()))]). Check the list of available parameters with estimator.get_params().keys().

I'm not sure what causes this, nor exactly how to utilize estimator.get_params().keys() to figure out which parameters are faulty.

Perhaps my uses of 'cv', CountVectorizer() or 'mlpc', MultiOutputClassifier(estimator=MLPClassifier())) are incorrect in relation to the grid parameters.

I believe I need to use CountVectorizer() here because my inputs (and desired label outputs) are all strings.

I very much appreciate an example of how GridSearchCV should be used for a Pipeline presumably utilizing CountVectorizer() and MLPClassifier in the correct way, and which grid parameters may be advisable to search.

Upvotes: 3

Views: 2566

Answers (1)

Sanjar Adilov
Sanjar Adilov

Reputation: 1099

TL;DR Try something like this:

mlpc = MLPClassifier(solver='adam',
                     learning_rate_init=0.01,
                     max_iter=300,
                     activation='relu',
                     early_stopping=True)
pipe = Pipeline(steps=[('cv', CountVectorizer(ngram_range=(1, 1))),
                       ('scale', StandardScaler()),
                       ('mlpc', MultiOutputClassifier(mlpc))])
search_space = {
    'cv__max_df': (0.9, 0.95, 0.99),
    'cv__min_df': (0.01, 0.05, 0.1),
    'mlpc__estimator__alpha': 10.0 ** -np.arange(1, 5),
    'mlpc__estimator__hidden_layer_sizes': ((64, 32), (128, 64),
                                            (64, 32, 16), (128, 64, 32)),
    'mlpc__estimator__tol': (1e-3, 5e-3, 1e-4),
}

Discussion:

  1. [Edit] For multi-output binary classification only, MLPClassifier supports multi-output classification, and having interrelating outputs, I wouldn't recommend using MultiOutputClassifier as it trains separate MLPClassifier instances without taking into account the relationship between outputs. Training only one MLPClassifier is faster, cheaper, and usually more accurate.
  2. The ValueError is due to improper parameter grid names. See Nested parameters.
  3. With a modest workstation and/or large training data, set solver='adam' to use a cheaper, first-order method as opposed to a second-order 'lbfgs'. Alternatively, try solver='sgd'---even cheaper to compute---but then also tune momentum. I anticipate that your data will be sparse and of different scales after CountVectorizer, and momentum/solver='adam' is a way to tackle variant gradients.
  4. Insert one of the standardization transformers (I guess StandardScaler will work better) after CountVectorizer as MLPs are sensitive to feature scaling. Although, solver='adam' would probably handle imbalanced bag of words well. Still, I believe it won't hurt to standardize your data.
  5. I think tuning activation is needles. Set activation='relu'.
  6. Use early_stopping=True, specify a large enough max_iter, and tune tol to prevent overfitting.
  7. Definitely tune learning_rate_init with solver='sgd'; for solver='adam', I assume higher learning rates will be OK and adam won't require comprehensive learning-rate tuning.
  8. Prefer deeper nets to wider ones (e.g., hidden_layer_sizes=(128, 64, 32) to hidden_layer_sizes=(256, 192)).
  9. Always tune alpha.
  10. Optimal hidden_layer_sizes may depend on a document-term dimension.
  11. Try setting higher batch_sizes but take into account computational expenses.
  12. If you wish to optimize CountVectorizer, tune max_df and min_df but not ngram_range; I believe at least a two-layer MLP will handle unigram relationships itself in hidden layers without need to process n-grams explicitly.
  13. Optimize the hyperparameters in the code sample above first. But note that the remaining hyperparameters can also affect both computational performance and predictive power.

Disclaimer: Most of the remarks are based on my (insubstantialšŸ¤”) assumptions about your data and pertain only to scikit-learn's MLPs. Refer to docs to learn more about neural networks and experiment with other tips. And remember, There is No Free Lunch.

Upvotes: 3

Related Questions