Mokus
Mokus

Reputation: 10400

How can I create a chain pipeline?

I would like to create a simply chain pipeline, I found this simple example:

"""
From https://stackoverflow.com/questions/33658355/piping-output-from-one-function-to-another-using-python-infix-syntax
"""
import collections

def pipe(original):
    """
    """

    class PipeInto(object):
        data = {'function': original}

        def __init__(self, *args, **kwargs):
            self.data['args'] = args
            self.data['kwargs'] = kwargs

        def __rrshift__(self, other):
            return self.data['function'](
                other,
                *self.data['args'],
                **self.data['kwargs']
            )
        def __call__(self):
            return self.data['function'](
                *self.data['args'],
                **self.data['kwargs']
            )

    return PipeInto

@pipe
def select(df, *args):
    cols = [x for x in args]
    return df[cols]

While the df >> select('one') works fine, the pipe= select(df, 'one') returns an object which needs to be called. How can select(df, 'one') work as a simple function call which returns the filtered DataFrame?

Upvotes: 3

Views: 544

Answers (1)

Paulo Scardine
Paulo Scardine

Reputation: 77251

Well, I can think of a solution but there is a caveat: your original function must not take a second positional argument that is a pandas dataframe (keyword arguments are ok). Lets ditch the __call__ and add a __new__ method to our PipeInto class inside the decorator. This new constructor tests if the first argument is a dataframe, and if it is then we just call the original function with the arguments:

def __new__(cls, *args, **kwargs):
    if args and isinstance(args[0], pd.DataFrame):
        return cls.data['function'](*args, **kwargs)
    return super().__new__(cls)

It seems to work, let me know if you find any downside.

>>> df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],
                       'two' : [4., 3., 2., 1., 3.]})

>>> select(df, 'one')
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

>>> df >> select('one')
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

Upvotes: 3

Related Questions