Reputation: 10400
I would like to create a simply chain pipeline, I found this simple example:
"""
From https://stackoverflow.com/questions/33658355/piping-output-from-one-function-to-another-using-python-infix-syntax
"""
import collections
def pipe(original):
"""
"""
class PipeInto(object):
data = {'function': original}
def __init__(self, *args, **kwargs):
self.data['args'] = args
self.data['kwargs'] = kwargs
def __rrshift__(self, other):
return self.data['function'](
other,
*self.data['args'],
**self.data['kwargs']
)
def __call__(self):
return self.data['function'](
*self.data['args'],
**self.data['kwargs']
)
return PipeInto
@pipe
def select(df, *args):
cols = [x for x in args]
return df[cols]
While the df >> select('one')
works fine, the pipe= select(df, 'one')
returns an object which needs to be called. How can select(df, 'one')
work as a simple function call which returns the filtered DataFrame?
Upvotes: 3
Views: 544
Reputation: 77251
Well, I can think of a solution but there is a caveat: your original function must not take a second positional argument that is a pandas dataframe (keyword arguments are ok). Lets ditch the __call__
and add a __new__
method to our PipeInto
class inside the decorator. This new constructor tests if the first argument is a dataframe, and if it is then we just call the original function with the arguments:
def __new__(cls, *args, **kwargs):
if args and isinstance(args[0], pd.DataFrame):
return cls.data['function'](*args, **kwargs)
return super().__new__(cls)
It seems to work, let me know if you find any downside.
>>> df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],
'two' : [4., 3., 2., 1., 3.]})
>>> select(df, 'one')
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
>>> df >> select('one')
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
Upvotes: 3