Reputation: 1031
I am tasked with refactoring a bunch of old code. One thing I wanted to fix is that there's a class of objects which has methods for direct database access, even though the objects themselves are representations of Excel worksheets. So, in my mind there should be another class of objects to mediate database access.
We're using these helper functions called "handlers", which take a list of parameters and return data from the database. The ExcelWorksheet class implements a method for running handlers asynchronously, and each subclass uses one or more handlers. This can also be quite hard to follow imo:
data_handler = {
# We are always going to use facebook here so add the arg here
"metrics": lambda *args: metrics_handler(*["facebook"] + list(args)),
"top_content": lambda *args: content_handler(*["facebook"] + list(args)),
}
@property
def data_requirements(self):
return [
["metrics", self.start, self.end, [self.mapping_id], [self.category_id], self.metrics],
["top_content", self.start, self.end, "total_actions", "user", [self.mapping_id], 10],
["top_content", self.start, self.end, "total_actions", "category", [self.category_id], 10],
["top_content", self.start, self.end, "comments_count", "user", [self.mapping_id], 5],
["top_content", self.start, self.end, "comments_count", "category", [self.category_id], 5],
["top_content", self.start, self.end, "shares_count", "user", [self.mapping_id], 5],
["top_content", self.start, self.end, "shares_count", "category", [self.category_id], 5],
]
@staticmethod
def _call_async(fns):
"""
Call the fns in a thread pool asynchronously and return a
generator of results
"""
futures = []
with CloseConnectionThreadPoolExecutor(max_workers=4) as executor:
for indx, f in enumerate(fns):
LOGGER.warn("Getting data dependency %s out of %s" % (indx + 1, len(fns)))
future = executor.submit(f)
futures.append(future)
return (i.result() for i in futures)
def require_data(self):
"""
Returns a generator of all the required data specified by the
worksheets in this workbook in the order specified.
Calls self.data_handler to get back a function for the given
key. If the requirement is a list, the first argument is the
key and subsequent items are passed as arguments to the function
returned.
Data requirement format:
[
'resource_1',
'resource_2',
['resource_3', 'arg1', 'arg2'],
['resource_3', 'arg3', 'arg4']
]
"""
if not self.data_handler:
raise NotImplementedError("No data_handler specified")
# For each requirement, route to the data handler
data_fns = []
for r in self.data_requirements:
# A requirement can be specified as a list where the
# first item is the resource and the rest are
# arguments to the function to fulfill the requirement
if isinstance(r, list):
key = r[0]
args = r[1:]
fn = self.data_handler[key](*args)
else:
fn = self.data_handler[r]
data_fns.append(fn)
return self._call_async(data_fns)
So I came to the conclusion I should create a separate DataSource object. But another thing I don't like about this code is the way these handlers are called. It seems really complicated and hard to understand. So I would like to be able to use my DataSource object like this:
datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()
I could define all these methods by hand, however there are a whole bunch of handlers and there could be even more added in the future. So I thought about using a metaclass here that would define an "add_request_for_x" method for every given handler. This is what I came up with so far:
class DataSourceMeta(type):
"""
This is a metaclass for defining a datasource class
data_handler_mappings - a dict that defines which handlers to use when looking for data. For example:
{"metrics":metrics_handler, "time_series":time_series_handler} - class objects will use metrics_handler
to search for "metrics" data category and time_series_handler to search for "time_series" data category
For every item in the dictionary the metaclass will add a new accordingly named class method for adding a new request.
In this case the methods will be named "add_request_for_metrics" and "add_request_for_time_series" ("add_request_for_{data_category}").
These requests can then be executed by calling the "execute_requests" method, which should return a list of results
Each class using this meta should implement the following interface:
add_request(): adds a request to be executed
_run_requests(): defines how requests are run(ex. sync vs async)
_clean_up(): performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)
"""
def __new__(cls,*args,**kwargs):
data_handler_mappings=kwargs.get("data_handler_mappings")
newclass = super(DataSourceMeta,cls).__new__(cls,*args)
for data_category,data_category_handler in data_handler_mappings.items():
data_category=data_category.lower()
method_name = "add_request_for_"+data_category
def method(self,*args,**kwargs):
newclass.add_request(self,data_category_handler(*args,**kwargs))
setattr(newclass,method_name,method)
def execute_requests(self):
results = newclass._run_requests(self)
newclass._clean_up(self)
return results
newclass.execute_requests = execute_requests
return newclass
Is this a good approach? Is there anything I can do better? I am only a junior programmer and this would be my first time using a metaclass for a serious project
Upvotes: 1
Views: 109
Reputation: 579
This answer is covering the option where you can avoid a metaclass.
My understanding from your question is that you want to improve the readability of your code, and also make the code suitable for a production system ("serious project")
In my opinion, this case described can avoid a metaclass, and is better suited to do so in a production system as you can introduce better type safety with non-dynamic classes. Additionally, I think a dynamic class in your proposed way makes it harder to at a glance understand what is a given "datasource" can request, ie the readability is low.
The main reason I think this is the case, is that you have a dictionary of methods used to fetch data (the data_handlers
). My understanding is that these methods are fixed and you are combining them in specific ways for different databases (excel sheets).
My suggestion in this case would be to use a series of mixins.
There is a limitation for this, from the code snippet provided, if different databases require different methods (ie data handlers need to be customised) to get data this method may not be ideal. But if it is just a handful of arguments that are specific, they can be added as class variables to the mixin.
import abc
class BaseDataSourceMixin(abc.ABC):
@abc.abstractmethod
def _run_requests(self):
raise NotImplementedError()
@abc.abstractmethod
def _clean_up(self):
"""defines how requests are run(ex. sync vs async)"""
raise NotImplementedError()
def execute_requests(self):
"""Performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)"""
results = self._run_requests()
self._clean_up()
return results
class RequestMixin(abc.ABC):
@abc.abstractproperty
def data_source_name(self) -> str:
pass
class MetricsRequest(RequestMixin):
def add_request_for_metrics(self, *args): # can use meaningful variable names
pass
class TopContentRequest(RequestMixin):
def add_request_for_top_content(self, *args):
print(self.data_source_name) # can be used here, and the subclass will define it
pass
class FacebookDataSource(
BaseDataSourceMixin,
# Now when looking at the code we can just see what we are inheriting
# and know what methods to expect in this function
MetricsRequest,
TopContentRequest,
):
@property
def data_source_name(self) -> str:
# This as an example returns a string, but I may suggest using a NamedTuple
# especially if there is a specific interface you are using
# that always requires the same number/type of variables
return "facebook"
def _run_requests(self):
return
def _clean_up(self):
return
datasource = FacebookDataSource()
datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()
Using it in this way we can easily get docstrings from the inherited methods, that can be exposed to our IDE, making is easier as a developer looking/understanding the code:
Upvotes: 1