scikit-learn pipelines: Normalising after PCA produces undesired random results

Question

I am running a pipeline that normalises the inputs, runs PCA, normalises PCA factors before finally running a logistic regression.

However, I am getting variable results on the confusion matrix I produce.

I am finding that, if I remove the 3rd step ("normalise_pca" ), my results are constant.

I have set random_state=0 for all the pipeline steps I can. Any idea why I am getting variable results?

def exp2_classifier(X_train, y_train):

    estimators = [('robust_scaler', RobustScaler()), 
                  ('reduce_dim', PCA(random_state=0)), 
                  ('normalise_pca', PowerTransformer()), #I applied this as the distribution of the PCA factors were skew
                  ('clf', LogisticRegression(random_state=0, solver="liblinear"))] 
                #solver specified here to suppress warnings, it doesn't seem to effect gridSearch
    pipe = Pipeline(estimators)

    return pipe

exp2_eval = Evaluation().print_confusion_matrix
logit_grid = Experiment().run_experiment(asdp.data, "heavy_drinker", exp2_classifier, exp2_eval);

scikit-learn pipelines: Normalising after PCA produces undesired random results

Answers (1)

Related Questions