Reputation: 559
I am new to designing classes in Python. Assume that we have a small pandas dataframe, df
. I want to write a few methods for this class which accepts this dataframe. In most methods, I want to work with a subset of just 2 columns and 2 rows. Assume that given the column numbers, the row numbers could be determined. Each method will use this subset. I ended up re-writing the sub-setting code for each method, which I am sure is redundant. How do I avoid this?
class Summary(object):
def __init__(self,summary_df):
self.summary = summary_df
#look_up_table={dict created}
def performance(self,col1,col2):
self.col1 = col1
self.col2 = col2
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
plt.plot(self.subset.iloc[0],self.subset.iloc[1],'--o')
plt.xlabel(self.row1)
plt.ylabel(self.row2)
plt.show()
def get_slope(self,col1,col2):
self.col1 = col1
self.col2 = col2
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
# code for calculating slope
return slope
def other_methods(self,col1,col2,col3)
I would like to avoid stating the column numbers when instantiating this class, if possible. Also, def other_methods
may need more than 2 columns anyway, so limiting the data only to two columns may not efficient, I think.
Any thoughts/suggestions?
Upvotes: 0
Views: 732
Reputation: 26
I'm not 100% sure what you're trying to accomplish since I'm not familiar with the underlying data so feel free to dismiss my answer if it's not helpful.
You could place the 'subsetting' portion of the code in the __init__
so that when you instantiate the class the data transformations that are common throughout the methods are done at the beginning and in one place.
For example:
class Summary(object):
def __init__(self,summary_df, col1, col2):
self.summary = summary_df
#look_up_table={dict created}
self.col1 = col1
self.col2 = col2
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
def performance(self):
plt.plot(self.subset.iloc[0],self.subset.iloc[1],'--o')
plt.xlabel(self.row1)
plt.ylabel(self.row2)
plt.show()
def get_slope(self):
# code for calculating slope
return slope
def other_methods(self,col1,col2,col3)
# code for other stuff
Using self
establishes a reference to the object to be used in the class.
edit: More examples
Assuming you're using pandas, you can pass the entire dataframe (or a relevant subset of it) to the class:
class Summary(object):
def __init__(self, summary_df, data_df):
self.summary = summary_df
#look_up_table = {dict created}
self.data_df = data_df
# If it'll always be two columns
def subset_df(self, some_col, another_col):
# takes a vertical slice of the original df
self.col1 = self.data_df[some_col]
self.col2 = self.data_df[another_col]
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
Now you can call it with:
do_summary = Summary(my_summary_df, my_data_df)
do_summary.subset_df('column_name1', 'column_name2')
print(do_summary.subset)
Upvotes: 0
Reputation: 77357
The common portion of this code is to take column identifiers and turn them into row plus subset references. Since you want to keep these calculated values on the object, set them in a single helper function.
class Summary(object):
def __init__(self,summary_df):
self.summary = summary_df
#look_up_table={dict created}
def _update_for_columns(self, col1, col2):
"""Given col1 and col2, update self with new values for
col1, col2, row1, row2 and subset"""
self.col1 = col1
self.col2 = col2
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
def performance(self,col1,col2):
self._update_for_columns(col1, col2)
plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
plt.xlabel(self.row1)
plt.ylabel(self.row2)
plt.show()
def get_slope(self,col1,col2):
self._update_for_columns(col1, col2)
# code for calculating slope
return slope
def other_methods(self,col1,col2,col3)
Better yet, since multiple methods want to use the same calculated values, they shouldn't also be setting the values. That results in needless recalculations. Have the caller update them before making any calls.
class Summary(object):
def __init__(self,summary_df):
self.summary = summary_df
#look_up_table={dict created}
def update_for_columns(self, col1, col2):
"""Update for new columns before calling performance, et al."""
self.col1 = col1
self.col2 = col2
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
def performance(self):
plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
plt.xlabel(self.row1)
plt.ylabel(self.row2)
plt.show()
def get_slope(self):
# code for calculating slope
return slope
def other_methods(self,col1,col2,col3)
And even better, put the methods that use columns into their own class so that you don't risk having two parts of your code think they are operating on the same thing.
class Summary(object):
def __init__(self,summary_df):
self.summary = summary_df
#look_up_table={dict created}
def get_summary_cols(self, col1, col2):
return SummaryCols(self, col1, col2)
class SummaryCols(object):
def __init__(self, summary, col1, col2):
self.summary = summary # assuming you need stuff from summary...
self.col1 = col1
self.col2 = col2
self.row1 = look_up_table[self.col1]
self.row2 = look_up_table[self.col2]
self.subset = self.summary.loc[[self.row1,self.row2]][[self.col1,self.col2]]
def performance(self):
plt.plot(self.subset.iloc[0], self.subset.iloc[1],'--o')
plt.xlabel(self.row1)
plt.ylabel(self.row2)
plt.show()
def get_slope(self):
# code for calculating slope
return slope
def other_methods(self,col1,col2,col3)
Upvotes: 1