blueshift
blueshift

Reputation: 6882

Arrange a Python object to discover what name it is assigned to

TL;DR summary

How can I write foo = MyClass() and have the MyClass object know that its name is foo?

Full Question

I want to write a library in Python that allows constructing a tree of objects, which support recursive traversal and naming. I would like to make this easy to use by automating the discovery of an object's name and parent object at the time it is created and added to the parent object. In this simple example, the name and parent object have to be explicitly passed into each object constructor:

class TreeNode:
    def __init__(self, name, parent):
        self.name = name
        self.children = []
        self.parent = parent
        if parent is None:
            self.fullname = name
        else:
            parent.register_child(self)

    def register_child(self, child):
        self.children.append(child)
        child.fullname = self.fullname + "." + child.name

    def recursive_print(self):
        print(self.fullname)
        for child in self.children:
            child.recursive_print()

class CustomNode(TreeNode):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.foo = TreeNode(name="foo", parent=self)
        self.bar = TreeNode(name="bar", parent=self)

root = TreeNode(name="root", parent=None)
root.a = CustomNode(name="a", parent=root)

root.recursive_print()

output:

root
root.a
root.a.foo
root.a.bar

What I would like to be able to do is omit the explicit name and parent arguments, something like:

class CustomNode(TreeNode):
    def __init__(self):
        self.foo = TreeNode()
        self.bar = TreeNode()

root = TreeNode(parent=None)
root.a = CustomNode()

I have a partial solution at the moment where I have TreeNode.__setattr__() check to see if it is assigning a new TreeNode and if so name it and register it; but one shortcoming is that TreeNode.__init__() cannot know its name or parent until after it returns, which would be preferable.

I am wondering if there is some neat way to do what I want, using metclasses or some other feature of the language.

Upvotes: 1

Views: 117

Answers (3)

jsbueno
jsbueno

Reputation: 110271

Ordinarily, there is no way to do that. And in Python, a "name" is just a reference to the actual object - it could have several names pointing to it as in x = y = z = MyNode(), or no name at all - if the object is put inside a data structure like in mylist.append(MyNode()) .

So, keep in mind that even native structures from the language itself require one to repeat the name as string in cases like this - for example, when creating namedtuples (point = namedtuple("point", "x y")) or when creating classes programatically, by calling type as in MyClass = type("MyClass", (), {})

Of course, Python having the introspection capabilities it does, it would be possible for the constructor from TreeNode to retrieve the function from which it was called, by using sys._getframe(), then retrieve the text of the source code, and if it would be a simple well formed line like self.foo = TreeNode(), to extract the name foo from there manipulating the string. you should not do that, due to the considerations given above. (and the source code may not always be available to the running program, in which case this method would not work)

If you are always creating the nodes inside methods, the second most straighforward thing to do seems to be adding a short method to do it. The most straightforward is still typing the name twice in cases like this, just as you are doing.

class CustomNode(TreeNode):
    def __init__(self):
        self.add_node("foo")
        self.add_noded("bar")
        excrafurbate(self.foo)  # attribute can be used, as it is set in the method
    def add_node(self, name):
        setattr(self, name, TreeNode(name=name, parent=self))

There are some exceptions in the language for typing names twice, though. The one meant for this kind of things is that fixed attributes in a class can have a special method (__set_name__) through which they get to know their name. However, they are set per_class, and if you need separate instances of TreeNode in each instance of CustomNode, some other code have to be put in so that the new nodes are instantiated in a lazy way, or when the container class is instantiated.

In this case,it looks like it is possible to simply create a new TreeNode instance whenever the attribute is accessed in a new instance: the mechanism of __set_name__ is the descriptor protocol - the same used by Python property builtin. If new nodes are created empty by default it is easy to do - and you then control their attributes:


class ClsTreeNode:
    def __set_name__(self, owner, name):
        self.name = name
        
    def __get__(self, instance, owner):
        if instance is None:
            return self
        value = getattr(instance, "_" + self.name, None)
        if value is None:
            value = TreeNode(name = self.name, parent=instance)
            setattr(instance, "_" + self.name, value)
        return value
    
    def __set__(self, instance, value):
        # if setting a new node is not desired, just raise a ValueError
        if not isinstance(value, TreeNode):
            raise TypeError("...")
        # adjust name in parent inside parent node
        # oterwise create a new one.
        # ...or accept the node content, and create the new TreeNode here, 
        # one is free to do whatever wanted.
        value.name = self.name
        value.patent = instance
        setattr(instance, "_" + self.name, value)



class  CustomNode(TreeNode):
    foo = ClsTreeNode()
    bar = ClsTreeNode()

This, as stated, will only work if ClsTreeNode is a class attribute - check the descriptor protocol docs for more details.

The other way of not having to type the name twice would again fall in the "hackish, do not use", that would be: abuse the class statement. With a proper custom metaclass, the class foo(metaclass=TreeNodeMeta): pass statement does not need to create a new class, and can instead return any new object - and the call to the metaclass __new__ method will be passed the name of the class. It would still have to resort to inspecting the Frame object in the call stack to findout about its parent (while, as can be seen above, by using the descriptor protocol, one does have the parent object for free).

Upvotes: 3

Kaleb Barrett
Kaleb Barrett

Reputation: 1594

I don't think what you are trying to achieve is possible without very smart AST metaprogramming. So that leaves us with alternatives. If your hierarchy is static, you could do a two stage initialization.

You define the type with the child field's type and their name listed as attribute annotations. And instrument it with either a decorator or metaclass. I'd prefer the decorator: metaclasses and inheritance can get pretty icky.

@node
class TopNode:
    subnode1: OtherNode1
    subnode2: OtherNode2

    def init(self, arg1, arg2=None):
        self.subnode1.init(arg1)
        self.subnode2.init(arg1, arg2=arg2)
        self.a = 1

Instead of running regular call-then-setattr instantiations in an __init__, you would call an init function that works on an object that has already been instantiated. This is a lot like how C++ classes work. Before you ask, I don't think dataclasses will work here, or if they would, it would be painful.

Upvotes: 1

Lenormju
Lenormju

Reputation: 4368

There is the [traceback] library :

This module provides a standard interface to extract, format and print stack traces of Python programs.

Here is a proof of concept :

import traceback


def foo():
    stack = traceback.extract_stack()
    frame = stack[-2]  # 0 is the root, -1 is the current one (`foo`), -2 is the one before (the caller)
    # we have some information :
    print(f"{frame.name=} {frame.locals=!r}")
    print(f"{frame.filename=!r} {frame.lineno=}")
    print(f"{frame.line=!r}")
    assign_name = frame.line.split("=")[0].strip()  # FIXME !!!!
    return assign_name


a = foo()
print(a)
frame.name='<module>' frame.locals=None
frame.filename='/home/stack_overflow/so70873290.py' frame.lineno=15
frame.line='a = foo()'
a

It is based on fetching the previous frame from the Python call stack. Using that in your __init__ method would allow to know the name it was assigned on.

But it has many problems :

  • what happens if I do not assign, just call your TreeNode ? It would not know its name.
  • what happens if I assign to self.something.else.foo ? Would the node name be something.else.foo ? What if something is a TreeNode ?

Also, my solution using traceback does not work if my assign is spread on multiple lines :

a = (
    foo()
)

Here, traceback finds its limits. Under the hood, traceback.extract_stack just formats what's in sys._getframe().f_back.

Instead, we can use the real deal : the Python bytecode ! Although inspect may be used for that purpose, here I used dis

import dis
import itertools
import sys


def foo():
    frame = sys._getframe()  # /!\ It is not guaranteed to exist in all implementations of Python.
    lasti = frame.f_back.f_lasti

    assign_instruction = next(
        itertools.dropwhile(
            lambda inst: inst.opcode != 90,  # same as `opname != "STORE_NAME"`
            itertools.dropwhile(
                lambda inst: inst.offset <= lasti,
                dis.get_instructions(frame.f_back.f_code)  # generator
            )
        ),
        None
    )
    return assign_instruction.argval


a = (
    foo()
)
print(a)
a

But as explained in jsbueno's answer, I warn against using that. But all of that is standard library, and used with caution to implement useful features (that's for example how traceback.walk works).

Help :

Upvotes: 0

Related Questions