Reputation: 1370
New to python and sklearn so apologies in advance. I have two transformers and I would like to gather the results in a `FeatureUnion (for a final modelling step at the end). This should be quite simple but FeatureUnion is stacking the outputs rather than providing an nx2 array or DataFrame. In the example below I will generate some data that is 10 rows by 2 columns. This will then generate two features that are 10 rows by 1 column. I would like the final feature union to have 10 rows and 1 column but what I get are 20 rows by 1 column.
I will try to demonstrate with my example below:
some imports
import numpy as np
import pandas as pd
from sklearn import pipeline
from sklearn.base import TransformerMixin
some random data
df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])
a custom transformer that selects a column
class Trans(TransformerMixin):
def __init__(self, col_name):
self.col_name = col_name
def fit(self, X):
return self
def transform(self, X):
return X[self.col_name]
a pipeline that uses the transformer twice (in my real case I have two different transformers but this reproduces the problem)
pipe = pipeline.FeatureUnion([
('select_a', Trans('a')),
('select_b', Trans('b'))
])
now i use the pipeline but it returns an array of twice the length
pipe.fit_transform(df).shape
(20,)
however I would like an array with dimensions (10, 2).
Quick fix?
Upvotes: 2
Views: 665
Reputation: 36555
The transformers in the FeatureUnion
need to return 2-dimensional matrices, however in your code by selecting a column, you are returning a 1-dimensional vector. You could fix this by selecting the column with X[[self.col_name]]
.
Upvotes: 3