Make PCA by group of features to Scikit-Learn Pipeline instead to the whole features

Question

I have a dataframe composed with 100 features being used to a cluster problem. Those features are divided into 3 blocks of features N1, N2 and N3 and all features have, as a suffix, the correspondent group. As example, the name of the feature might be:

umidity_n1, air_n1, lat_n2, long_n2, etc..

So, by now, I am applying, in my pipeline, PCA to the whole data, where I would like that the PCA was applied by groups. So, one PCA for features with _n1 suffix, one PCA for features with _n2 suffix and, another PCA for features with _n3 suffix.

My pipeline is working as:

## Pipeline
prepData = Pipeline(
    [
        ("scaler", StandardScaler()),
        ("pca", PCA(n_components=20, random_state=42)),
    ]
)

kModel = Pipeline(
    [
        (
            "kmeans",
                KMeans(
                    n_clusters=6,
                    init="k-means++",
                    n_init=20,
                    max_iter=100,
                    random_state=42,
                ),
        ),
    ]
)

pipe = Pipeline(
    [
        ("prepData", prepData),
        ("kModel", kModel)
    ]
)

Any ideas how to split the PCA procedure by blocks of variables inside the above pipeline?

Make PCA by group of features to Scikit-Learn Pipeline instead to the whole features

Answers (1)

Related Questions