Reputation: 4640
From the documentation I already read that:
A FunctionTransformer forwards its X (and optionally y) arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc.
However, I don't understand what use this function has. Could anybody explain the purpose of this function?
Upvotes: 6
Views: 14173
Reputation: 5992
Let's say you have image arrays with a known value range between 0-255
that you want to scale down between 0-1
, but you don't want to use StandardScaler
because not all images would have values of 0 and 255 in them. In simpler terms. No one scored a 100% on the test, but you still want to scale between 0-100.
from sklearn.preprocessing import FunctionTransformer
import numpy as np
data = np.array([[100, 2], [240, 80], [139, 10], [10, 150]])
def div255(X): return X/255 #encode
def mult255(X): return X*255 #decode
scaler = FunctionTransformer(div255, inverse_func=mult255)
# --- encode ---
mutated = scaler.fit_transform(data)
"""
array([[0.39215686, 0.00784314],
[0.94117647, 0.31372549],
[0.54509804, 0.03921569],
[0.03921569, 0.58823529]])
"""
# --- decode ---
scaler.inverse_transform(mutated)
"""
array([[100., 2.],
[240., 80.],
[139., 10.],
[ 10., 150.]])
"""
Make sure you define these custom functions in a place where they can be referenced by the rest of your program (e.g. helper functions). Especially for when it comes time to inverse_transform
your predictions and/or encode new samples!
Upvotes: 4
Reputation: 19169
In addition to simply wrapping a given user-defined function, the FunctionTransformer provides some standard methods of other sklearn estimators (e.g., fit
and transform
). The benefit of this is that you can introduce arbitrary, stateless transforms into an sklearn Pipeline, which combines multiple processing stages. This makes executing a processing pipeline easier because you can simply pass your data (X
) to the fit
and transform
methods of the Pipeline
object without having to explicitly apply each stage of the pipeline individually.
Here is an example copied directly from the sklearn documentation (located here):
def all_but_first_column(X):
return X[:, 1:]
def drop_first_component(X, y):
"""
Create a pipeline with PCA and the column selector and use it to
transform the dataset.
"""
pipeline = make_pipeline(
PCA(), FunctionTransformer(all_but_first_column),
)
X_train, X_test, y_train, y_test = train_test_split(X, y)
pipeline.fit(X_train, y_train)
return pipeline.transform(X_test), y_test
Note that the first principal component wasn't explicitly removed from the data. The pipeline automatically chains the transformations together when pipeline.transform
is called.
Upvotes: 13
Reputation: 8144
X = [[5,6,7],
[8,9,10],
[1,2,3]]
def exampleFunctionTransformer(X):
return X**2
def exampleofFunctionTransfor():
fx = FunctionTransformer(exampleFunctionTransformer)
Y1= fx.transform(X)
print(Y1)
return Y1
Z = exampleofFunctionTransfor()
print(Z)
o/p
[[ 25 36 49]
[ 64 81 100]
[ 1 4 9]]
Upvotes: -1
Reputation: 7576
Here is a nice example. It really is what it says: given X input, it applies your function to X and returns the result. The most important part of it is its statelessness. Here and here you can find what statelessness is and here you can read a discussion about its advantages.
Upvotes: 2