How to avoid repeating code while creating classes in Python

Question

I am new to designing classes in Python. Assume that we have a small pandas dataframe, df. I want to write a few methods for this class which accepts this dataframe. In most methods, I want to work with a subset of just 2 columns and 2 rows. Assume that given the column numbers, the row numbers could be determined. Each method will use this subset. I ended up re-writing the sub-setting code for each method, which I am sure is redundant. How do I avoid this?

class Summary(object):
    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def performance(self,col1,col2):
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]

        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
        plt.plot(self.subset.iloc[0],self.subset.iloc[1],'--o')

        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self,col1,col2):
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]

        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

I would like to avoid stating the column numbers when instantiating this class, if possible. Also, def other_methods may need more than 2 columns anyway, so limiting the data only to two columns may not efficient, I think. Any thoughts/suggestions?

tdelaney · Accepted Answer

The common portion of this code is to take column identifiers and turn them into row plus subset references. Since you want to keep these calculated values on the object, set them in a single helper function.

class Summary(object):

    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def _update_for_columns(self, col1, col2):
        """Given col1 and col2, update self with new values for
        col1, col2, row1, row2 and subset"""
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]
        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]

    def performance(self,col1,col2):
        self._update_for_columns(col1, col2)
        plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self,col1,col2):
        self._update_for_columns(col1, col2)
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

Better yet, since multiple methods want to use the same calculated values, they shouldn't also be setting the values. That results in needless recalculations. Have the caller update them before making any calls.

class Summary(object):

    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def update_for_columns(self, col1, col2):
        """Update for new columns before calling performance, et al."""
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]
        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]


    def performance(self):
        plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self):
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

And even better, put the methods that use columns into their own class so that you don't risk having two parts of your code think they are operating on the same thing.

class Summary(object):

    def __init__(self,summary_df):
        self.summary = summary_df
        #look_up_table={dict created}

    def get_summary_cols(self, col1, col2):
        return SummaryCols(self, col1, col2)

class SummaryCols(object):

    def __init__(self, summary, col1, col2):
        self.summary = summary # assuming you need stuff from summary...
        self.col1 = col1
        self.col2 = col2
        self.row1 = look_up_table[self.col1]
        self.row2 = look_up_table[self.col2]
        self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]

    def performance(self):
        plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
        plt.xlabel(self.row1)
        plt.ylabel(self.row2)
        plt.show()

    def get_slope(self):
        # code for calculating slope
        return slope

    def other_methods(self,col1,col2,col3)

How to avoid repeating code while creating classes in Python

Answers (2)

Related Questions