Reputation: 8366
How do you better structure the code in your class so that your class returns the df
that you want, but you don't have a main method which calls a lot of other methods in sequential order. I find that in a lot of situations I arrive at this structure and it seems bad. I have a df
that I just overwrite it with the result of other base functions (that I unit test) until I get what I want.
class A:
def main(self):
df = self.load_file_into_df()
df = self.add_x_columns(df)
df = self.calculate_y(df)
df = self.calculate_consequence(df)
...
return df
def add_x_columns(df)
def calculate_y(df)
def calculate_consequence(df)
...
# now use it somewhere else
df = A().main()
Upvotes: 5
Views: 133
Reputation: 164683
One feature you may wish to utilize is pd.DataFrame.pipe
. This is considered "pandorable" because it facilitates operator chaining.
In my opinion, you should separate reading data into a dataframe from manipulating the dataframe. For example:
class A:
def main(self):
df = self.load_file_into_df()
df = df.pipe(self.add_x_columns)\
.pipe(self.calculate_y)\
.pipe(self.calculate_consequence)
return df
Function composition is not native to Python, but the 3rd party toolz
library does offer this feature. This allows you to lazily define chained functions. Note the reversed order of operations, i.e. the last argument of compose
is performed first.
from toolz import compose
class A:
def main(self)
df = self.load_file_into_df()
transformer = compose(self.calculate_consequence,
self.calculate_y,
self.add_x_columns)
df = df.pipe(transformer)
return df
In my opinion, compose
offers a flexible and adaptable solution. You can, for example, define any number of compositions and apply them selectively or repeatedly at various points in your workflow.
Upvotes: 2