Reputation: 1099
I have written a python C Extension. Its working fine. But Now for more efficient execution, I need to write a multithreaded/parallel-executing version of the same extension.
Can you please tell me, how to write a Python C-Extension code that runs on multiple cores at the same time.
I have struck here for more than day. Please help.
Upvotes: 0
Views: 387
Reputation: 1235
maybe too late but hope helps other people:)
simplest way to parallel executing of C extension is using OPENMP API. from wikipedia:
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, processor architectures and operating systems.
for example see this part of code:
int i;
for (i=0;i<10;i++)
{
printf("%d ",i);
}
result:
0 1 2 3 4 5 6 7 8 9
we can make it parallel using #pragma omp parallel for
compiler directive before for
statement block:
int i;
#pragma omp parallel for
for (i=0;i<10;i++)
{
printf("%d ",i);
}
result:
0 1 5 8 9 2 6 4 3 7
to enabling openmp in gcc you need to specify -fopenmp
compile-time flag. Example:
gcc -fPIC -Wall -O3 costFunction.c -o costFunction.so -shared -fopenmp
you can lean openmp from HERE.
the are other ways like pthread but it is very low-level.
OpenMP vs. PThread: example from HERE written in C++.
serial C++ code:
void sum_st(int *A, int *B, int *C){
int end = 10000000;
for(int i = 0; i < end; i++)
A[i] = B[i] + C[i];
}
pthread solution:
struct params {
int *A;
int *B;
int *C;
int tid;
int size;
int nthreads;
};
void *compute_parallel(void *_p){
params *p = (params*) _p;
int tid = p->tid;
int chunk_size = (p->size / p->nthreads);
int start = tid * chunk_size;
int end = start + chunk_size;
for(int i = start; i < end; i++) p->A[i] = p->B[i] + p->C[i];
return 0;
}
void sum_mt(int *A, int *B, int *C){
int nthreads = 4;
int size = 10000000;
pthread_t threads[nthreads]; //array to hold thread information
params *thread_params = (params*) malloc(nthreads * sizeof(params));
for(int i = 0; i < nthreads; i++){
thread_params[i].A = A;
thread_params[i].B = B;
thread_params[i].C = C;
thread_params[i].tid = i;
thread_params[i].size = size;
thread_params[i].nthreads = nthreads;
pthread_create(&threads[i], NULL, compute_parallel, (void*) &thread_params[i]);
}
for(int i = 0; i < nthreads; i++){
pthread_join(threads[i], NULL);
}
free(thread_params);
}
OpenMP solution:
#pragma omp parallel for
for(int i = 0; i < 10000000; i++)
A[i] = B[i] + C[i];
Upvotes: 2