Reputation: 633
I'm running a bunch of models with scikit-learn to solve a classification problem.
Here is the code that should do all the running:
for model_name, classifier, param_grid, cv, cv_name in tqdm(zip(model_names, classifiers, param_grids, cvs, cv_names)):
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier)])
train_and_score_model(model_name, pipeline, param_grid, cv=cv)
My question is, how can I retain the output of my train_and_score_model
function? It returns a cv object, i.e. a model.
What I tried to do, but I don't think is right, is create a list cv_names = ['dm_cv', 'lr_cv', 'knn_cv', 'svm_cv', 'dt_cv', 'rf_cv', 'nb_cv']
and set each one as the for loop runs. That is the cv_name
iterator in the for loop head.
I don't think that's right though, because wouldn't I be setting a string, instead of a variable? As in, what I should really have is cv_names = [dm_cv, lr_cv, knn_cv, svm_cv, dt_cv, rf_cv, nb_cv]
, but I don't think I can have a list like that.
Another way I thought of is saving each model in a dictionary, where the keys would be the elements of the list I outlined above. I don't know if I can have a model as a dictionary value though.
Here is the clunky, repetitive code I currently run to do what I want in the for-loop:
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_dm)])
dm_cv = train_and_score_model('Dummy Model', pipeline, param_grid_dm)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_lr)])
lr_cv = train_and_score_model('Logistic Regression', pipeline, param_grid_lr)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_knn)])
knn_cv = train_and_score_model('K Nearest Neighbors', pipeline, param_grid_knn)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_svm)])
svm_cv = train_and_score_model('Support Vector Machine', pipeline, param_grid_svm)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_dt)])
dt_cv = train_and_score_model('Decision Tree', pipeline, param_grid_dt)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_rf)])
rf_cv = train_and_score_model('Random Forest', pipeline, param_grid_rf)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_nb)])
nb_cv = train_and_score_model('Naive Bayes', pipeline, param_grid_nb)
Upvotes: 2
Views: 2515
Reputation: 1614
You can create a dictionary with mappings of classifier names with their information i.e. objects and paramter grids:
models_list = {'Logistic Regression': (classifier_lr, param_grid_lr),
'K Nearest Neighbours': (classifier_knn, param_grid_knn)}
Iterate through every key-value pair in the dictionary and build your pipelines:
model_cvs = {}
for model_name, model_info in models_list.items():
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', model_info[0])])
model_cvs[model_name] = train_and_score_model(model_name, pipeline, model_info[1])
Upvotes: 1