Claudiu Creanga
Claudiu Creanga

Reputation: 8366

Better way to structure a series of df manipulations in your class

How do you better structure the code in your class so that your class returns the df that you want, but you don't have a main method which calls a lot of other methods in sequential order. I find that in a lot of situations I arrive at this structure and it seems bad. I have a df that I just overwrite it with the result of other base functions (that I unit test) until I get what I want.

class A:
 def main(self):
   df = self.load_file_into_df()
   df = self.add_x_columns(df)
   df = self.calculate_y(df)
   df = self.calculate_consequence(df)
   ...
   return df

 def add_x_columns(df)
 def calculate_y(df)
 def calculate_consequence(df)
 ...

# now use it somewhere else
df = A().main()

Upvotes: 5

Views: 133

Answers (1)

jpp
jpp

Reputation: 164683

pipe

One feature you may wish to utilize is pd.DataFrame.pipe. This is considered "pandorable" because it facilitates operator chaining.

In my opinion, you should separate reading data into a dataframe from manipulating the dataframe. For example:

class A:
    def main(self):
        df = self.load_file_into_df()

        df = df.pipe(self.add_x_columns)\
               .pipe(self.calculate_y)\
               .pipe(self.calculate_consequence)

    return df

compose

Function composition is not native to Python, but the 3rd party toolz library does offer this feature. This allows you to lazily define chained functions. Note the reversed order of operations, i.e. the last argument of compose is performed first.

from toolz import compose

class A:
    def main(self)
        df = self.load_file_into_df()

        transformer = compose(self.calculate_consequence,
                              self.calculate_y,
                              self.add_x_columns)

        df = df.pipe(transformer)

    return df

In my opinion, compose offers a flexible and adaptable solution. You can, for example, define any number of compositions and apply them selectively or repeatedly at various points in your workflow.

Upvotes: 2

Related Questions