Yair Daon
Yair Daon

Reputation: 1123

Extend python with C, return numpy array gives garbage

I am wrapping a C file so I can use it in python. The output of the C function is an array of doubles. I want this to be a numpy array in python. I get garbage. Here's an example that generates the error.

First, the C file (focus on the last function definition, everything else should be OK):

#include <Python.h>
#include <numpy/arrayobject.h>
#include <stdio.h>

static char module_docstring[] =
    "docstring";

static char error_docstring[] =
        "generate the error";

static PyObject *_aux_error(PyObject *self, PyObject *args);

static PyMethodDef module_methods[] = {
        {"error", _aux_error, METH_VARARGS, error_docstring},
        {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC init_tmp(void) {

    PyObject *m = Py_InitModule3("_tmp", module_methods, module_docstring);
    if (m == NULL)
        return;

    /* Load `numpy` functionality. */
    import_array();
}

static PyObject *_aux_error(PyObject *self ,PyObject *args) {

    double vector[2] = {1.0 , 2.0};

    npy_intp dims[1] = { 2 };

    PyObject *ret  = PyArray_SimpleNewFromData(1, dims, (int)NPY_FLOAT , vector );
    return ret;
}

Compilation goes OK (from what I understand - I used a python script that compiles everything).

In python, I run the following script to test my new module:

try:
    import _tmp
    res = _tmp.error()
    print(res)
except:
    print("fail")

The result I see on the screen is garbage. I tried substituting (int)NPY_FLOAT with (int)NPY_FLOAT32, (int)NPY_FLOAT64, (int)NPY_DOUBLE and I still get garbage. I am using python2.7.

Thanks!!!

EDIT: following the answer below, I changed the last function to:

static PyObject *_aux_error(PyObject *self, PyObject *args) {


    double *vector = calloc(2, sizeof(double));
    vector[0] = 1.0;
    vector[1] = 2.0;


    npy_intp *dims = calloc(1 , sizeof(npy_intp));
    dims[1] = 2;


    PyObject *ret  = PyArray_SimpleNewFromData(1, dims, (int)NPY_FLOAT , &vector );
    return ret;
}

Now python shows an empty array.

Upvotes: 5

Views: 4165

Answers (3)

user4382923
user4382923

Reputation:

Warren's solution seems to be working, although freeing the C array memory block results in a error on compilation for me. I got the memcopy trick to work in the minimalistic function below (copying a 1D C array to numpy via the pointer), which, for simplicity, does not take any arguments, and should give the reader a good idea how to apply this to C arrays instead of vectors:

static PyObject *_cmod_test(PyObject *self, PyObject *args)
    {
    double f[5] = {0,1,2,3,4};
    int d[1] = {5};
    PyObject *c = PyArray_FromDims(1,d,NPY_DOUBLE);
    memcpy(PyArray_DATA(c), f, 5*sizeof(double));
    return c;    
    };

The launch.py script is simple

import _cmod
_cmod.test()

Don't forget to declare the functions

#include <Python.h>
#include <numpy/arrayobject.h>
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
static PyObject *_cmod_test(PyObject *self, PyObject *args);

Any suggestions for the use with PyArray_SimpleNewFromData (whilst avoiding the memory leak pittfall)? Perhaps something similar to the broken code below.

static PyObject *_cmod_test(PyObject *self, PyObject *args)
    {
    double f[5] = {0,1,2,3,4};
    npy_intp dims[1] = {5};
    PyObject *c = PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE ,f);
    PyArray_ENABLEFLAGS(c, NPY_ARRAY_OWNDATA);
    return c;
    };

I also recommend Dan Foreman Mackay's blog on the python C API.

Upvotes: 2

Warren Weckesser
Warren Weckesser

Reputation: 114831

Try changing this:

static PyObject *_aux_error(PyObject *self) {

to this:

static PyObject *_aux_error(PyObject *self, PyObject *args) {

Python will pass the args argument, even if you don't define your function with it.

There's still a fundamental problem with your code. You have created a numpy array using an array, vector, that is on the stack. When _aux_error returns, that memory is reclaimed and might be reused.

You could create the array using PyArray_SimpleNew() to allocate the numpy array, and then copy vector to the array's data:

static PyObject *_aux_error(PyObject *self, PyObject *args)
{
    double vector[2] = {1.0 , 2.0};
    npy_intp dims[1] = {2};

    PyObject *ret = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
    memcpy(PyArray_DATA(ret), vector, sizeof(vector));
    return ret;
}

Note that I changed the type to NPY_DOUBLE; NPY_FLOAT is the 32 bit floating point type.


In a comment, you asked about dynamically allocating the memory in _aux_error. Here's a variation of the example that might be useful. The length of the array is still hardcoded in dims, so it isn't completely general, but it might be enough to address the question from the comments.

static PyObject *_aux_error(PyObject *self, PyObject *args)
{
    double *vector;
    npy_intp dims[1] = {5};
    npy_intp k;

    PyObject *ret = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
    vector = (double *) PyArray_DATA(ret);
    /*
     *  NOTE: Treating PyArray_DATA(ret) as if it were a contiguous one-dimensional C
     *  array is safe, because we just created it with PyArray_SimpleNew, so we know
     *  that it is, in fact, a one-dimensional contiguous array.
     */
    for (k = 0; k < dims[0]; ++k) {
        vector[k] = 1.0 + k;
    }
    return ret;
}

Upvotes: 8

Yair Daon
Yair Daon

Reputation: 1123

Here is my full solution, for your amusement. Copy, paste and modify. Obviously the problem I was faced with is a bit more complicated than the question above. I used some of Dan Foreman Mackay's online code.

The goal of my code is to return a covariance vector (whatever that is). I have a C file named aux.c that returns a newly allocated array:

#include "aux.h"
#include <math.h>
#include <stdlib.h>
double *covVec(double *X, double *x, int nvecs, int veclen) {


    double r = 1.3;
    double d = 1.0;

    double result;
    double dist;
    int n;

    double *k;
    k = malloc(nvecs * sizeof(double));

    int row;
    for( row = 0 ; row < nvecs ; row++) {

        result = 0.0;
        for (n = 0; n < veclen; n++) {
                dist = x[n] - X[row*veclen + n];
                result += dist * dist;
        }

        result = d*exp(  -result/(2.0*r*r)  );
        k[row] = result;
    }
    return k;
}

Then, I need a very short header file named aux.h:

double *covVec(double *X, double *x, int nvecs, int veclen);

To wrap this to python I have _aux.c:

#include <Python.h>
#include <numpy/arrayobject.h>
#include "aux.h"
#include <stdio.h>

static char module_docstring[] =
    "This module provides an interface for calculating covariance using C.";

static char cov_vec_docstring[] =
    "Calculate the covariances between a vector and a list of vectors.";

static PyObject *_aux_covVec(PyObject *self, PyObject *args);

static PyMethodDef module_methods[] = {
        {"cov_vec", _aux_covVec, METH_VARARGS, cov_vec_docstring},
        {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC init_aux(void) {

    PyObject *m = Py_InitModule3("_aux", module_methods, module_docstring);
    if (m == NULL)
        return;

    /* Load `numpy` functionality. */
    import_array();
}


static PyObject *_aux_covVec(PyObject *self, PyObject *args)
{
    PyObject *X_obj, *x_obj;

    /* Parse the input tuple */
    if (!PyArg_ParseTuple(args, "OO", &X_obj, &x_obj ))
        return NULL;

    /* Interpret the input objects as numpy arrays. */
    PyObject *X_array = PyArray_FROM_OTF(X_obj, NPY_DOUBLE, NPY_IN_ARRAY);
    PyObject *x_array = PyArray_FROM_OTF(x_obj, NPY_DOUBLE, NPY_IN_ARRAY);


    /* If that didn't work, throw an exception. */
    if (X_array == NULL || x_array == NULL ) {
        Py_XDECREF(X_array);
        Py_XDECREF(x_array);
        return NULL;
    }

    /* What are the dimensions? */
    int nvecs  = (int)PyArray_DIM(X_array, 0);
    int veclen = (int)PyArray_DIM(X_array, 1);
    int xlen   = (int)PyArray_DIM(x_array, 0);

    /* Get pointers to the data as C-types. */
    double *X    = (double*)PyArray_DATA(X_array);
    double *x    = (double*)PyArray_DATA(x_array);


    /* Call the external C function to compute the covariance. */
    double *k = covVec(X, x, nvecs, veclen);



    if ( veclen !=  xlen ) {
        PyErr_SetString(PyExc_RuntimeError,
                                "Dimensions don't match!!");
        return NULL;
    }

    /* Clean up. */
    Py_DECREF(X_array);
    Py_DECREF(x_array);

    int i;
    for(i = 0 ; i < nvecs ; i++) {
        printf("k[%d]   = %f\n",i,k[i]);
        if (k[i] < 0.0) {
            PyErr_SetString(PyExc_RuntimeError,
                        "Covariance should be positive but it isn't.");
            return NULL;
        }
    }

    npy_intp dims[1] = {nvecs};

    PyObject *ret = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
    memcpy(PyArray_DATA(ret), k, nvecs*sizeof(double));
    free(k);

    return ret;
}

I have a python file called setup_cov.py:

from distutils.core import setup, Extension
import numpy.distutils.misc_util

setup(
    ext_modules=[Extension("_aux", ["_aux.c", "aux.c"])],
    include_dirs=numpy.distutils.misc_util.get_numpy_include_dirs(),
)

I compile from command line using python2.7 setup_cov.py build_ext --inplace. Then I run the following python test file:

import numpy as np
import _aux as a

nvecs  = 6
veclen = 9
X= []
for _ in range(nvecs):
    X.append(np.random.normal(size= veclen))
X = np.asarray(X)

x = np.random.normal(size=veclen)
k = a.cov_vec(X,x)
print(k)

Upvotes: 3

Related Questions