hardik24
hardik24

Reputation: 1058

How to walk through Python opcode while debugging cpython?

I want to understand the functioning of Python Interpreter. I understand the process of generation on opcode and want to better understand the interpreter part. For that I read a lot on internet and got to know about for (;;) loop in ceval.c file in python interpreter(Cpython).

Now I want to interpret the following python code a.py:

a = 4
b = 5
c = a + b

when i do python -m dis a.py

  1           0 LOAD_CONST               0 (4)
              2 STORE_NAME               0 (a)

  2           4 LOAD_CONST               1 (5)
              6 STORE_NAME               1 (b)

  3           8 LOAD_NAME                0 (a)
             10 LOAD_NAME                1 (b)
             12 BINARY_ADD
             14 STORE_NAME               2 (c)
             16 LOAD_CONST               2 (None)
             18 RETURN_VALUE

Now I have put the debug point in switch(opcode) line in ceval.c. And now when i start the debugger it comes to this position for more than 2000 times. I think this is because before starting, python has to do some other interpretations stuff as well. So, my question is how do I debug only the relevant opcodes instructions?

Basically, how do i know the instruction I am debugging are actually from the program I created?

Please help me out with the same. Thanks in advance.

Upvotes: 5

Views: 960

Answers (1)

MiniMax
MiniMax

Reputation: 1093

I do a lot of CPython debugging for better understanding the way it works. The lack of possibility to set a gdb breakpoint in Python source files I solved by writing a C extension module.

The idea: CPython is a big program written in C language. We can easy debug it as any C program - no problems here. If we want to stop execution when the _PyType_Lookup function is started, we just run a break _PyType_Lookup command. Thus, if we add our own C function into the CPython program, for example cbreakpoint, we can stop execution every time the cbreakpoint is called. And if we will find the way to insert this cbreakpoint function into the source.py, we will get the required functionality - every time the interpreter will see the cbreakpoint, it will be stopped (if we set break cbreakpoint before). We can do that by writing a C extension".

How I did that (I can miss something, because I am reproducing from memory):

  1. Downloaded a CPython source into the ~/learning_python/cpython-master directory and compiled it. There were some intricacies - Can't get rid of “value has been optimized out” in GDB.
  2. Created a module itself - my_breakpoint.c.
  3. Created a setup file - my_breakpoint_setup.py.
  4. Run a

    ~/learning_python/cpython-master/python my_breakpoint_setup.py build
    

    command. It created a my_breakpoint.cpython-38dm-x86_64-linux-gnu.so file.

  5. Copied the shared object file from previous step into CPython's Lib directory:

    cp -iv my_breakpoint.cpython-38dm-x86_64-linux-gnu.so ~/learning_python/cpython-master/Lib/
    

    The copying is needed for convenience, otherwise we should have this .so file in any directory we want use (import) this module.

  6. Now, we can make a following source.py:

    #!/usr/bin/python3
    
    from my_breakpoint import cbreakpoint
    
    cbreakpoint(1)
    a = 4
    
    cbreakpoint(2)
    b = 5
    
    cbreakpoint(3)
    c = a + b
    

    To execute this file we must use our ~/learning_python/cpython-master interpreter, not a system's python3, because the system's python doesn't have the my_breakpoint module:

    ~/learning_python/cpython-master/python source.py
    
  7. To debug this file do:

    gdb --args ~/learning_python/cpython-master/python -B source.py
    

    Then, inside gdb:

    (gdb) start
    
    (gdb) break cbreakpoint
    Function "cbreakpoint" not defined.
    Make breakpoint pending on future shared library load? (y or [n]) y
    Breakpoint 2 (cbreakpoint) pending.
    
    (gdb) cont
    

    There is one problem. When you are pressing cont, gdb is stopped at the beginning of the cbreakpoint function and you are needing to do many next commands to skip this function and a CPython function calling code to achieve the beginning of the desired Python code execution. Or you can set a new breakpoint after cbreakpoint was hitted, like:

    (gdb) break ceval.c:1080 ### The LOAD_CONST case beginning
    (gdb) cont
    

    But, after doing this many times I were automating these actions, so you can just add these lines into your ~/.gdbinit:

    set breakpoint pending on
    break cbreakpoint
        command $bpnum
        tbreak ceval.c:1098
            command $bpnum
            n
            end
        cont
        end
    set breakpoint pending off
    

    Now, you just start gdb as in the 7 step and do:

    (gdb) start
    (gdb) cont
    

    and you will jumped to the beginning of the source.py code execution.

my_breakpoint.c

#include <Python.h>

static PyObject* cbreakpoint(PyObject *self, PyObject *args){
    int breakpoint_id;

    if(!PyArg_ParseTuple(args, "i", &breakpoint_id))
        return NULL;

    return Py_BuildValue("i", breakpoint_id);
}

static PyMethodDef my_methods[] = { 
    {"cbreakpoint", cbreakpoint, METH_VARARGS, "breakpoint function"},  
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef my_breakpoint = { 
    PyModuleDef_HEAD_INIT,  
    "my_breakpoint",
    "the module for setting C breakpoint in the Python source",
    -1, 
    my_methods
};

PyMODINIT_FUNC PyInit_my_breakpoint(void){
    return PyModule_Create(&my_breakpoint);
}

my_breakpoint_setup.py

from distutils.core import setup, Extension

module = Extension('my_breakpoint', sources = ['my_breakpoint.c'])

setup (name = 'PackageName',
       version = '1.0',
       description = 'This is a package for my_breakpoint module',
       ext_modules = [module])

P.S.

I asked the same question in the past, it can be useful for you: The optimal way to set a breakpoint in the Python source code while debugging CPython by GDB.

Upvotes: 4

Related Questions