Reputation: 1374
I have been struggling for a few day to understand the access to numpy arrays in C extension, but I've trouble understanding the documentation.
Edit: Here is the code I would like to port to c (the grav function)
import numpy as np
def grav(p, M):
G = 6.67408*10**-2 # m³/s²T
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
XXX = G * d * d2**(-1.5)
# computing Newton formula :
# acceleration undergone by b from all previous
a[:, b] = -(M[None, :b] * XXX).sum(axis=1)
# computing Newton formula : adding for each previous,
# acceleration undergone by from b
a[:, :b] += M[b] * XXX
return a
system_p = np.array([[1., 2., 3., 4., 5., 6., 7., 9., 4., 0.],
[3., 2., 5., 6., 3., 5., 6., 3., 5., 8.]])
system_M = np.array( [3., 5., 1., 2., 4., 5., 4., 5., 6., 8.])
system_a = grav(system_p, system_M)
for i in range(len(system_p[0])):
print('body {:2} mass = {}(ton), position = {}(m), '
'acceleration = [{:8.4f} {:8.4f}](m/s²)'.format(i,
system_M[i], system_p[:, i], system_a[0, i], system_a[1, i]))
I've found here a simple example using a simple iterator. It works perfectly well, but it doesn't go beyond one dimensional array and provides no information about how to proceed when your array has multiple dimensions and you wish to iterate on a subset of them or you wish to specify in which order (e.g. row-wise/column-wise) you want to iterate them. i.e. with this method, you can iterate through a multidimensional array, but only in one single way.
Edit : It appears that NpyIter_MultiNew is not related with multidimensionnal iteration but with iterating multiple arrays in one go.
Looking in the documentation, I found this function:
NpyIter* NpyIter_MultiNew(npy_intp nop,
PyArrayObject** op,
npy_uint32 flags,
NPY_ORDER order,
NPY_CASTING casting,
npy_uint32* op_flags,
PyArray_Descr** op_dtypes)
which might be what I need, but I don't even understand the first sentence of the description:
Creates an iterator for broadcasting the nop array objects provided in op, [...]
What is this «nop array objects»? What is it's relationship with the op parameter? I know I'm no native English speaker, but I still have the feeling that this documentation could be clearer than it is.
Then I found other resources like this which seem to have a completely different approach (no iterators - so, I suppose, manual iteration), but it doesn't even compile without correcting it (still working on it though).
So, please, does anyone here have experience in this regard, who could provide simple examples on how to do that?
Upvotes: 4
Views: 2264
Reputation: 1374
Ok, I finally managed to do it. Since the greatest difficulty was to find good introductory material, I leave an example as sample code. Here are the functions⁽¹⁾ of the API I used or considered using:
(1): described in the documentation
PyArray_Descr *PyArray_DESCR(PyArrayObject* arr)¶
Is a macro which "returns" the PyArray_Descr *descr
field of the C PyArrayObject structure, which is a pointer to the dtype property of the array.
int PyArray_NDIM(PyArrayObject *arr)
Is a macro which "returns" the int nd
field of the C PyArrayObject structure, which contains the number of dimensions of the array.
npy_intp *PyArray_DIMS(PyArrayObject *arr)
npy_intp *PyArray_SHAPE(PyArrayObject *arr)
are synonym macros which "return" the npy_intp *dimensions
field of the C PyArrayObject structure, which points to a C array containing the size for all dimensions of the array, or
npy_intp PyArray_DIM(PyArrayObject* arr, int n)
Which "returns" the nth entry in the previous array (i.e. the size of the nth dimension.
npy_intp *PyArray_STRIDES(PyArrayObject* arr)
or
npy_intp PyArray_STRIDE(PyArrayObject* arr, int n)
are macros which respectively "return" the npy_intp *strides
field of the C PyArrayObject structure which points to the (array of) strides of the array, or the nth entry in this array. The strides are the number of bytes to skip between "lines" for all dimensions of the array. Since the array in contiguous, this shouldn't be needed, but might avoid the program to have to multiply itself, the number of cells by their size.
PyObject* PyArray_NewLikeArray(PyArrayObject* prototype, NPY_ORDER order, PyArray_Descr* descr, int subok)
is a function which creates a new numpy array, having the same shape as the prototype passed as parameter. This array is uninitialized.
PyArray_FILLWBYTE(PyObject* obj, int val)
is a function which calls memset
to initialize a given numpy array.
void *PyArray_DATA(PyArrayObject *arr)
Is a macro which "returns" the char *data
field of the C PyArrayObject structure, which points to the real data space of the array, which has the same shape as a C array.
Here is the declaration of the PyArrayObject
structure, as described in the documentation :
typedef struct PyArrayObject {
PyObject_HEAD
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
PyObject *base;
PyArray_Descr *descr;
int flags;
PyObject *weakreflist;
} PyArrayObject;
And here is the sample code:
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <numpy/arrayobject.h>
#define G 6.67408E-8L
void * failure(PyObject *type, const char *message) {
PyErr_SetString(type, message);
return NULL;
}
void * success(PyObject *var){
Py_INCREF(var);
return var;
}
static PyObject *
Py_grav_c(PyObject *self, PyObject *args)
{
PyArrayObject *p, *M;
PyObject *a;
int i, j, k;
double *pq0, *pq1, *Mq0, *Mq1, *aq0, *aq1, *p0, *p1, *a0, *a1;
if (!PyArg_ParseTuple(args, "O!O!", &PyArray_Type, &p, &PyArray_Type, &M))
return failure(PyExc_RuntimeError, "Failed to parse parameters.");
if (PyArray_DESCR(p)->type_num != NPY_DOUBLE)
return failure(PyExc_TypeError, "Type np.float64 expected for p array.");
if (PyArray_DESCR(M)->type_num != NPY_DOUBLE)
return failure(PyExc_TypeError, "Type np.float64 expected for M array.");
if (PyArray_NDIM(p)!=2)
return failure(PyExc_TypeError, "p must be a 2 dimensionnal array.");
if (PyArray_NDIM(M)!=1)
return failure(PyExc_TypeError, "M must be a 1 dimensionnal array.");
int K = PyArray_DIM(p, 0); // Number of dimensions you want
int L = PyArray_DIM(p, 1); // Number of bodies in the system
int S0 = PyArray_STRIDE(p, 0); // Normally, the arrays should be contiguous
int S1 = PyArray_STRIDE(p, 1); // But since they provide this Stride info
int SM = PyArray_STRIDE(M, 0); // I supposed they might not be (alignment)
if (PyArray_DIM(M, 0) != L)
return failure(PyExc_TypeError,
"P and M must have the same number of bodies.");
a = PyArray_NewLikeArray(p, NPY_ANYORDER, NULL, 0);
if (a == NULL)
return failure(PyExc_RuntimeError, "Failed to create output array.");
PyArray_FILLWBYTE(a, 0);
// For all bodies except first which has no previous body
for (i = 1,
pq0 = (double *)(PyArray_DATA(p)+S1),
Mq0 = (double *)(PyArray_DATA(M)+SM),
aq0 = (double *)(PyArray_DATA(a)+S1);
i < L;
i++,
*(void **)&pq0 += S1,
*(void **)&Mq0 += SM,
*(void **)&aq0 += S1
) {
// For all previous bodies
for (j = 0,
pq1 = (double *)PyArray_DATA(p),
Mq1 = (double *)PyArray_DATA(M),
aq1 = (double *)PyArray_DATA(a);
j < i;
j++,
*(void **)&pq1 += S1,
*(void **)&Mq1 += SM,
*(void **)&aq1 += S1
) {
// For all dimensions calculate deltas
long double d[K], d2 = 0, VVV, M0xVVV, M1xVVV;
for (k = 0,
p0 = pq0,
p1 = pq1;
k<K;
k++,
*(void **)&p0 += S0,
*(void **)&p1 += S0) {
d[k] = *p1 - *p0;
}
// calculate Hypotenuse squared
for (k = 0, d2 = 0; k<K; k++) {
d2 += d[k]*d[k];
}
// calculate interm. results once for each bodies pair (optimization)
VVV = G * (d2>0 ? pow(d2, -1.5) : 1); // anonymous intermediate result
#define LIM = 1
// VVV = G * pow(max(d2, LIM), -1.5); // Variation on collision case
M0xVVV = *Mq0 * VVV; // anonymous intermediate result
M1xVVV = *Mq1 * VVV; // anonymous intermediate result
// For all dimensions calculate component of acceleration
for (k = 0,
a0 = aq0,
a1 = aq1;
k<K;
k++,
*(void **)&a0 += S0,
*(void **)&a1 += S0) {
*a0 += M1xVVV*d[k];
*a1 -= M0xVVV*d[k];
}
}
}
/* clean up and return the result */
return success(a);
}
// exported functions list
static PyMethodDef grav_c_Methods[] = {
{"grav_c", Py_grav_c, METH_VARARGS, "grav_c(p, M)\n"
"\n"
"grav_c takes the positions and masses of m bodies in Newtonian"
" attraction in a n dimensional universe,\n"
"and returns the accelerations each body undergoes.\n"
"input data take the for of a row of fload64 for each dimension of the"
" position (in p) and one row for the masses.\n"
"It returns and array of the same shape as p for the accelerations."},
{NULL, NULL, 0, NULL} // pour terminer la liste.
};
static char grav_c_doc[] = "Compute attractions between n bodies.";
static struct PyModuleDef grav_c_module = {
PyModuleDef_HEAD_INIT,
"grav_c", /* name of module */
grav_c_doc, /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
grav_c_Methods
};
PyMODINIT_FUNC
PyInit_grav_c(void)
{
// I don't understand why yet, but the program segfaults without this.
import_array();
return PyModule_Create(&grav_c_module);
}
Upvotes: 12
Reputation: 30909
Probably the best way to figure it out is to create the iterator in Python and experiment with it there. This will be slow, but it'll confirm what you're doing is right. You then use NpyIter_AdvancedNew
, using the default parameters wherever possible.
I'm afraid I haven't actually translated this into C code myself - it was taking too long for me. I therefore suggest you don't accept this answer since it only gives a starting point really.
My guess would be that any performance improvements will be disappointing given the amount of effort that writing C code is (especially since I'd guess that writing fast code needs a deeper level of understanding). At the end of the answer I suggest a few of simpler alternatives that I'd recommend instead of using the C API.
I've translated a couple of lines from your code as examples:
d = p[:, b:b+1] - p[:, :b]
Becomes
with np.nditer([p[:,b],p[:,:b],None],
op_axes=[[0,-1],[0,1],[0,1]]) as it:
for x,y,z in it:
z[...] = x - y
d = it.operands[2]
Note that you need to do the slicing of the p
array beforehand. I've passed one of the arrays as None
. This translates to a NULL
pointer in C and means that the array gets created with the appropriate size (using standard broadcasting rules).
In terms of op_axes
the first array is only 1D, so I've said "iterate over axis 0 first; there isn't an axis 1". The second and third arrays are two D so I've said "iterate over axis 0 then axis 1".
In Python it infers op_flags
automatically. I don't know if it'll do that in C. If not then they should be:
npy_uint32 op_flags[] = { NPY_ITER_READONLY,
NPY_ITER_READONLY,
NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE };
The most important bit is that the third axis is allocated.
My view is that you'd want to specify op_dtypes
in C as
{ PyArray_DescrFromType(NPY_DOUBLE),
PyArray_DescrFromType(NPY_DOUBLE),
NULL }
to force the arrays to be of the right type (the type of the third allocated array can be worked out from the two inputs). That way you should be able to cast you data pointers to double*
in C.
The line:
d2 = (d*d).sum(axis=0)
translates to
with np.nditer([d,None],
flags=['reduce_ok'],
op_flags=[['readonly'],['readwrite','allocate']],
op_axes=[[1,0],[0,-1]]) as it:
it.operands[1][...] = 0
for x,z in it:
z[...] += x*x
d2 = it.operands[1]
The most important difference is that it's a reduction (the second output array is smaller than the input because one of the axis is a sum). Therefore we pass 'reduce_ok' as a flag.
The second array only has one axis, so it's op_axes
is [0, -1]
. The axis is the second array matches axis 1 of the first array, hence the op_axes
for the first array is set to [1, 0]
.
When translating to C the line it.operands[1][...] = 0
becomes more complicated:
Note that if you want to do a reduction on an automatically allocated output, you must use NpyIter_GetOperandArray to get its reference, then set every value to the reduction unit before doing the iteration loop.
In C I would probably allocate d2
as a zero array first and pass that to the iterator instead.
Writing C API code for this involves a lot of code, error checking, reference counting etc.. While it should be a "simple" translation (the nditer
API is basically the same in both C and Python) it isn't easy.
If I were you look at using some of the standard tools to speed up Python, for example Numba, NumExpr or Cython. Numba and NumExpr are just-in-time compilers, that can do things like avoid allocating intermediate arrays. Cython is a "Python-like" language where you can specify types. To show the first few parts translated into Cython:
def grav3(double[:,:] p, M):
G = 6.67408e-2 # m³/s²T
cdef int l = p.shape[1]
a = np.empty(shape=(2, l))
a[:, 0] = 0
cdef double[:,::1] d
cdef double[::1] d2
cdef Py_ssize_t b, i, j
for b in range(1, l):
# computing the distance between body #b and all previous
d = np.empty((p.shape[0],b))
for i in range(d.shape[0]):
for j in range(b):
d[i,j] = p[i,b] - p[i,j]
d2 = np.zeros((d.shape[1]))
for j in range(d.shape[1]):
for i in range(d.shape[0]):
d2[j] += d[i,j]*d[i,j]
if d2[j] == 0:
d2[j] = 1
Here I've specified some of the arrays a 1D or 2D double arrays double[:]
or double[:,:]
. I've then written the loops out explicitly , which avoids creating intermediates.
Cython generates C code where it gets PyArray_DATA
and then uses PyArray_STRIDES
to work out where to access in a 2D array. You might find this easier than using iterators. You could examine the code Cython generates to see how it does it. There are PyArray_GetPtr
functions available in Numpy too that do this kind of access which you might find easier than using iterators.
Upvotes: 0