Jack Avante
Jack Avante

Reputation: 1595

Replacing Python's parser functionality?

First of all I want to mention that I know this is a horrible idea and it shouldn't be done. My intention is mainly curiosity and learning the innards of Python, and how to 'hack' them.

I was wondering whether it is at all possible to change what happens when we, for instance, use [] to create a list. Is there a way to modify how the parser behaves in order to, for instance, cause ["hello world"] to call print("hello world") instead of creating a list with one element?

I've attempted to find any documentation or posts about this but failed to do so.

Below is an example of replacing the built-in dict to instead use a custom class:

from __future__ import annotations
from typing import List, Any
import builtins


class Dict(dict):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__dict__ = self

    def subset(self, keys: List[Any]) -> Dict:
        return Dict({key: self[key] for key in keys})


builtins.dict = Dict

When this module is imported, it replaces the dict built-in with the Dict class. However this only works when we directly call dict(). If we attempt to use {} it will fall back to the base dict built-in implementation:

import new_dict

a = dict({'a': 5, 'b': 8})
b = {'a': 5, 'b': 8}

print(type(a))
print(type(b))

Yields:

<class 'py_extensions.new_dict.Dict'>
<class 'dict'>

Upvotes: 5

Views: 813

Answers (2)

Nizam Mohamed
Nizam Mohamed

Reputation: 9240

The ast module is an interface to Python's Abstract Syntax Tree which is built after parsing Python code.
It's possible to replace literal dict ({}) with dict call by modifying Abstract Syntax Tree of Python code.

import ast
import new_dict

a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}

print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))

src = """

a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}

print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))

"""

class RewriteDict(ast.NodeTransformer):
    def visit_Dict(self, node):
        # don't replace `dict({"a": 1})`
        if isinstance(node.parent, ast.Call) and node.parent.func.id == "dict":
            return node
        # replace `{"a": 1} with `dict({"a": 1})
        new_node = ast.Call(
            func=ast.Name(id="dict", ctx=ast.Load()),
            args=[node],
            keywords=[],
            type_comment=None,
        )
        return ast.fix_missing_locations(new_node)


tree = ast.parse(src)

# set parent to every node
for node in ast.walk(tree):
    for child in ast.iter_child_nodes(node):
        child.parent = node

RewriteDict().visit(tree)
exec(compile(tree, "ast", "exec"))

output;

<class 'new_dict.Dict'>
<class 'dict'>
<class 'dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>

Upvotes: 1

Chris
Chris

Reputation: 1643

[] and {} are compiled to specific opcodes that specifically return a list or a dict, respectively. On the other hand list() and dict() compile to bytecodes that search global variables for list and dict and then call them as functions:

import dis

dis.dis(lambda:[])
dis.dis(lambda:{})
dis.dis(lambda:list())
dis.dis(lambda:dict())

returns (with some additional newlines for clarity):

  3           0 BUILD_LIST               0
              2 RETURN_VALUE

  5           0 BUILD_MAP                0
              2 RETURN_VALUE

  7           0 LOAD_GLOBAL              0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

  9           0 LOAD_GLOBAL              0 (dict)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

Thus you can overwrite what dict() returns simply by overwriting the global dict, but you can't overwrite what {} returns.

These opcodes are documented here. If the BUILD_MAP opcode runs, you get a dict, no way around it. As an example, here is the implementation of BUILD_MAP in CPython, which calls the function _PyDict_FromItems. It doesn't look at any kind of user-defined classes, it specifically makes a C struct that represents a python dict.

It is possible in at least some cases to manipulate the python bytecode at runtime. If you really wanted to make {} return a custom class, I suppose you could write some code to search for the BUILD_MAP opcode and replace it with the appropriate opcodes. Though those opcodes aren't the same size, so there's probably quite a few additional changes you'd have to make.

Upvotes: 3

Related Questions