Reputation: 3659
I have the following code. What I want to do is to apply different transformers to every column of a pandas dataframe. For the beginning, I want just my two columns passthrough without any transforms.
import pandas as pd
from sklearn.compose import ColumnTransformer
df = pd.DataFrame({'pre0': [2, 0, 1, 2], 'pre1': [99, 56, 85, 78]})
column_meta_data = [("p1", "passthrough", "pre0"), ("p2", "passthrough", "pre1")]
column_transformer = ColumnTransformer(transformers=column_meta_data)
X_ = column_transformer.fit_transform(df)
I get the following error:
ValueError: The output of the 'p1' transformer should be 2D (scipy matrix, array, or pandas DataFrame).
This is weird since the columns of the pandas dataframe should already be one dimensional. What am I doing wrong here?
Upvotes: 1
Views: 1693
Reputation: 2816
I think maybe in the definition of columns, as in here:
"columnsstr, array-like of str, int, array-like of int, array-like of bool, slice or callable. Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where transformer expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above. To select multiple columns by name or dtype, you can use make_column_selector."
In practical terms. Instead of
column_meta_data = [("p1", "passthrough", "pre0"), ("p2", "passthrough", "pre1")]
this one:
column_meta_data = [("p1", "passthrough", ["pre0"]), ("p2", "passthrough", ["pre1"])]
or
column_meta_data = [("p1", "passthrough", ["pre0","pre1"])]
Upvotes: 4