tumbleweed
tumbleweed

Reputation: 4640

What is scikit-learn FunctionTransformer used for?

From the documentation I already read that:

A FunctionTransformer forwards its X (and optionally y) arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc.

However, I don't understand what use this function has. Could anybody explain the purpose of this function?

Upvotes: 6

Views: 14173

Answers (4)

Kermit
Kermit

Reputation: 5992

Custom Function Use Case

Let's say you have image arrays with a known value range between 0-255 that you want to scale down between 0-1, but you don't want to use StandardScaler because not all images would have values of 0 and 255 in them. In simpler terms. No one scored a 100% on the test, but you still want to scale between 0-100.

from sklearn.preprocessing import FunctionTransformer
import numpy as np


data = np.array([[100, 2], [240, 80], [139, 10], [10, 150]])


def div255(X): return X/255 #encode
def mult255(X): return X*255 #decode
scaler = FunctionTransformer(div255, inverse_func=mult255)


# --- encode ---
mutated = scaler.fit_transform(data)
"""
array([[0.39215686, 0.00784314],
       [0.94117647, 0.31372549],
       [0.54509804, 0.03921569],
       [0.03921569, 0.58823529]])
"""

# --- decode ---
scaler.inverse_transform(mutated)
"""
array([[100.,   2.],
       [240.,  80.],
       [139.,  10.],
       [ 10., 150.]])
"""

Pro Tip

Make sure you define these custom functions in a place where they can be referenced by the rest of your program (e.g. helper functions). Especially for when it comes time to inverse_transform your predictions and/or encode new samples!

Upvotes: 4

bogatron
bogatron

Reputation: 19169

In addition to simply wrapping a given user-defined function, the FunctionTransformer provides some standard methods of other sklearn estimators (e.g., fit and transform). The benefit of this is that you can introduce arbitrary, stateless transforms into an sklearn Pipeline, which combines multiple processing stages. This makes executing a processing pipeline easier because you can simply pass your data (X) to the fit and transform methods of the Pipeline object without having to explicitly apply each stage of the pipeline individually.

Here is an example copied directly from the sklearn documentation (located here):

def all_but_first_column(X):
    return X[:, 1:]
def drop_first_component(X, y):
    """
    Create a pipeline with PCA and the column selector and use it to
    transform the dataset.
    """
    pipeline = make_pipeline(
        PCA(), FunctionTransformer(all_but_first_column),
    )
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    pipeline.fit(X_train, y_train)
    return pipeline.transform(X_test), y_test

Note that the first principal component wasn't explicitly removed from the data. The pipeline automatically chains the transformations together when pipeline.transform is called.

Upvotes: 13

backtrack
backtrack

Reputation: 8144

X =  [[5,6,7],
      [8,9,10],
      [1,2,3]]    

def exampleFunctionTransformer(X):
        return  X**2

    def exampleofFunctionTransfor():
        fx = FunctionTransformer(exampleFunctionTransformer)
        Y1= fx.transform(X)
        print(Y1)
        return Y1

    Z = exampleofFunctionTransfor()

    print(Z)


o/p 

    [[ 25  36  49]
     [ 64  81 100]
     [  1   4   9]]

refer this : http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html#sklearn.preprocessing.FunctionTransformer

Upvotes: -1

lte__
lte__

Reputation: 7576

Here is a nice example. It really is what it says: given X input, it applies your function to X and returns the result. The most important part of it is its statelessness. Here and here you can find what statelessness is and here you can read a discussion about its advantages.

Upvotes: 2

Related Questions