Reputation: 95
I have a C++ program, and I want to run a neural network. To make that easy to use, I have the C++ code compile a batch of inputs for the network, and then call a Python script. The Python script returns a list, and I collect this list in C++. The code of my function is below.
void strings_to_pylist(PyObject* p_list_out, const vector<string>& c_list) const {
int i = 0;
for (const string& c_symbol : c_list) {
PyObject* p_symbol = PyUnicode_FromString(c_symbol.c_str());
PyList_SetItem(p_list, idx, p_item)(p_list_out, i, p_symbol);
++i;
}
}
const vector< pair<int, float> >
my_func(const vector< vector<string> >& query_traces, data_map& id) const {
PyObject* p_list = PyList_New(query_traces.size());
for(int i=0; i<query_traces.size(); i++){
PyObject* p_tmp = PyList_New(query_traces[i].size());
strings_to_pylist(p_tmp, query_traces[i]);
PyList_SetItem(p_list, i, p_tmp);
}
PyObject* p_result;
try{
p_result = PyObject_CallOneArg(query_func, p_list);
}
catch(...){
cout << "Running gc and trying again" << endl;
PyRun_SimpleString("gc.collect()");
p_result = PyObject_CallOneArg(query_func, p_list);
}
if (!PyList_Check(p_result))
throw std::runtime_error("Something went wrong, the Network did not return a list. What happened?");
vector< pair<int, float> > res;
for(int i=0; i<query_traces.size(); i++){
PyObject* p_type = PyList_GetItem(p_result, static_cast<Py_ssize_t>(i*2));
if(!PyUnicode_CheckExact(p_type)){
cerr << "Problem with type as returned by Python script. Is it a proper int?" << endl;
throw exception(); // force the catch block
}
PyObject* p_confidence = PyList_GetItem(p_result, static_cast<Py_ssize_t>(i*2 + 1));
if(!PyFloat_CheckExact(p_confidence)){
cerr << "Problem with type as returned by Python script. Is it a proper float?" << endl;
throw exception(); // force the catch block
}
int type = id.get_reverse_type(PyUnicode_AsUTF8(p_type));
if(type > id.get_alphabet_size()){
id.add_type(PyUnicode_AsUTF8(p_type));
}
res.emplace_back(type, static_cast<float>(PyFloat_AsDouble(p_confidence)));
//Py_SET_REFCNT(p_type, 0);
//Py_SET_REFCNT(p_confidence, 0);
}
//Py_DECREF(p_list);
//Py_DECREF(p_result);
return res;
}
Now I call this function a lot in my code, and it crashes due to memory overflow (16GB RAM). Trying to resolve that I made the following observations:
Now to the fixes: At first I tried to Py_DECREF(p_result); and Py_DECREF(p_list);. This did not seem to resolve my memory leak, at the very best it made it better. A second solution I tried was to remove the content of P_result as well, via using Py_SET_REFCNT() and setting the content to zero. This seemed to work for a while, but leads to segmentation faults in the PyObject_CallOneArg() at some point iff I combine it with Py_DECREF(p_result);. Interestingly, here the try{}catch{} block does not work. Now the questions:
Also, pointers at how to ideally debug the code would also greatly help. For example, it does not seem the case that I can run valgrind tools, they get stuck on the python execution.
Upvotes: 4
Views: 68
Reputation: 95
Ok, I think I can answer my own question here, although not everything is answered yet.
CPython does cache smaller integers. Say I have x = 5 y = 5 x is y , the answer will be true. That is why the refcount is so high, as apparently imported libraries from the Python script also use those numbers I was using.
The same concept.
This one I cannot answer yet, but since my Python script uses PyTorch there appears an issue with that. I updated my PyTorch version and do explicitely cast .detach() on everything that I have now, and if possible use the GPU for inference. Like this I capped the memory consumption growth at a much lower rate, albeit it still increases slowly even with dummy input. I will leave it as is for now, since at least my problem is solved, but an explanation would be not so bad. I can provide example code.
Using the Py_SET_REFCNT() to set that to zero led to segfaults, because I was deleting shared objects, see point 1 and 2. I will explain below the function and how it works in CPython.
In general the C++ code should not be able to catch errors that happen in the underlying Python. For how to deal with errors in this situation see e.g. https://docs.python.org/3/extending/extending.html#intermezzo-errors-and-exceptions
I will describe below.
As for the code and the conventions here, the following applies. In my code I first construct a list. This list has a refcount of 1. Then, in a loop I construct subsequent objects p_symbol, each with a refcount of 1 as well (in my real code I use lists of lists, but the same principle applies). The function PyList_SetItem() transfers ownership of p_symbol to p_list. Meaning, when p_symbol goes out of scope the object the p_symbol referred to still has a refcount of 1. Instead, freeing the memory held by p_symbol is responsibility of p_list now.
Then, I do call a Python script, which takes my list. As per convention my function still owns p_list, and it is my functions responsibility to remove it. Therefore, I should call Py_DECREF() at the end of my function or whenever I do not need p_list anymore. Since the refcount of p_list is 1 before the call PyDECREF() will free p_list, which will do the same for all its objects it is holding as well (or decreasing their refcount by 1 if larger).
The function PyList_GetItem() just returns a reference, but does not transfer ownership to my function. Hence I should not mess with their content, i.e. leave their counts as is.
And finally, also as per convention, functions in CPython that create objects and return them also transfer ownership. This means for my function that it is now its responsibility to call Py_DECREF() on p_result afterwards. This will again take care of the contents of p_result as well.
I hope this explanation helps, and feel free to modify. In general, this recourse helped me a lot: https://docs.python.org/3/extending/extending.html#
Also, reading the relevant code parts in the CPython GitHub repo: https://github.com/python/cpython
Upvotes: 1