Reputation: 2097
I am working on a structured data analysis framework which is based on streaming data between nodes. Currently nodes are implemented as subclasses of root Node class provided by the framework. For each node class/factory I need metadata, such as list of node's attributes, their description, node output. The metadata might be both: for end-users in front-end application or for programming use - some other stream management tools. In the future there will be more of them.
(Note that I just started to learn python while writing that code)
Currently the metadata are provided in a class variable
class AggregateNode(base.Node):
"""Aggregate"""
__node_info__ = {
"label" : "Aggregate Node",
"description" : "Aggregate values grouping by key fields.",
"output" : "Key fields followed by aggregations for each aggregated field. Last field is "
"record count.",
"attributes" : [
{
"name": "keys",
"description": "List of fields according to which records are grouped"
},
{
"name": "record_count_field",
"description": "Name of a field where record count will be stored. "
"Default is `record_count`"
}
]
}
More examples can be found here.
I feel that it can be done in much cleaner way. There is one restriction: as nodes are custom subclasses classes, there should be minimal interference with potential future attribute names.
What I was thinking to do was to split the current node_info. It was meant to be private to the framework, but now I realize it has much wider use. I was thinking about using node_ attributes: will have common attribute namespace, not taking too much of names from potential custom node attributes.
My question is: What is the most common way of providing such metadata in python programs? Single variable with a dictionary? Multiple variables, one for each metadata attribute? (this would conflict with the restriction) Custom class/structure? Use some kind of prefix, like node_* and use multiple variables?
Upvotes: 1
Views: 5049
Reputation: 1
The only element of a Python class able to modify the class definition itself (hence meta-data) is the __new__()
function, new is called before the object is actually created, and before is initiated. You can use it to read/modify your classes/nodes internal structure before they gets initializated with __init__()
Upvotes: 0
Reputation: 156238
A lot of the functionality you're describing overlaps with epydoc:
>>> class AggregateNode(base.Node):
... r"""
... Aggregate values grouping by key fields.
...
... @ivar keys: List of fields according to which records are grouped
...
... @ivar record_count_field: Name of a field where record count will be
... stored.
... """
... record_count_field = "record_count"
...
... def get_output(self):
... r"""
... @return: Key fields followed by aggregations for each aggregated field.
... Last field is record count.
... """
...
>>> import epydoc.docbuilder
>>> api = epydoc.docbuilder.build_doc(AggregateNode)
>>> api.variables['keys'].descr.to_plaintext(None)
u'List of fields according to which records are grouped\n\n'
>>> api.variables['record_count_field'].value.pyval
'record_count'
Upvotes: 1
Reputation: 7359
I'm not sure if there is some "standard" way to store custom metadata in python objects, but as an example, the python implementation of dbus adds attributes with the "_dbus
" prefix to the published methods and signals.
Upvotes: 1