Reputation: 6713
This may be a very basic question (and I can remove it if there are objections to it).
Suppose I have a function that I reuse multiple times in various projects:
def sort_clean(x, sort_cols):
x.sort(sort_cols, inplace=True)
x.reset_index(inplace=True, drop=True)
I want to make this a part of pandas
module such that whenever I do import pandas
and define a dataframe myDf
I can get mfDf.sort_clean
as an available function for that dataframe. Is this possible?
Upvotes: 1
Views: 103
Reputation: 77424
You can subclass a DataFrame
class NewDataFrame(pandas.DataFrame):
def sort_clean(self, sort_cols):
self.sort(sort_cols, inplace=True)
self.reset_index(inplace=True, drop=True)
For example:
In [25]: class NewDataFrame(pandas.DataFrame):
....: def sort_clean(self, sort_cols):
....: self.sort(sort_cols, inplace=True)
....: self.reset_index(inplace=True, drop=True)
....:
In [26]: dfrm
Out[26]:
A B C
0 0.382531 0.287066 0.345749
1 0.725201 0.450656 0.336720
2 0.146883 0.266518 0.011339
3 0.111154 0.190367 0.275750
4 0.757144 0.283361 0.736129
5 0.039405 0.643290 0.383777
6 0.632230 0.434664 0.094089
7 0.658512 0.368150 0.433340
8 0.062180 0.523572 0.505400
9 0.287539 0.899436 0.194938
[10 rows x 3 columns]
In [27]: my_df = NewDataFrame(dfrm)
In [28]: my_df.sort_clean(["B", "C"])
In [29]: my_df
Out[29]:
A B C
0 0.111154 0.190367 0.275750
1 0.146883 0.266518 0.011339
2 0.757144 0.283361 0.736129
3 0.382531 0.287066 0.345749
4 0.658512 0.368150 0.433340
5 0.632230 0.434664 0.094089
6 0.725201 0.450656 0.336720
7 0.062180 0.523572 0.505400
8 0.039405 0.643290 0.383777
9 0.287539 0.899436 0.194938
[10 rows x 3 columns]
But be aware that using any functions which return new DataFrame
objects will not return a NewDataFrame
automatically.
Normal monkey-patching (e.g. just creating a new attribute onto an existing DataFrame
instance like df.sort_clean = sort_clean
) will be tricky because the method needs the instance value supplied as the implicit first argument, especially since you do in-place mutation. For that you'd constantly have to use functools.partial
, or a lambda
with a default, to achieve the same thing:
df.sort_clean = lambda sort_cols, x=df: sort_clean(x, sort_cols)
Note that with the lambda
approach you need to specify the argument that will have a default last (arguments with default values must follow arguments without default values in Python). You can get around this if you choose to use functools.partial
instead.
import functools
df.sort_clean = functools.partial(sort_clean, df)
Upvotes: 4