Harrison
Harrison

Reputation: 830

MPI locking for sqlite (python)

I am using mpi4py for a project I want to parallelize. Below is very basic pseudo code for my program:

Load list of data from sqlite database
Based on COMM.Rank and Comm.Size, select chunk of data to process

Process data...

use MPI.Gather to pass all of the results back to root

if root:
    iterate through results and save to sqlite database

I would like to eliminate the call to MPI.Gather by simply having each process write its own results to the database. So I want my pseudo code to look like this:

Load list of data
Select chunk of data
Process data
Save results

This would drastically improve my program's performance. However, I am not entirely sure how to accomplish this. I have tried to find methods through google, but the only thing I could find is MPI-IO. Is it possible to use MPI-IO to write to a database? Specifically using python, sqlite, and mpi4py. If not, are there any alternatives for writing concurrently to a sqlite database?

EDIT:

As @CL pointed out in a comment, sqlite3 does not support concurrent writes to the database. So let me ask my question a little differently: Is there a way to lock writes to the database so that other processes wait till the lock is removed before writing? I know sqlite3 has its own locking modes, but these modes seem to cause insertions to fail rather than block. I know I've seen something like this in Python threading, but I haven't been able to find anything online about doing this with MPI.

Upvotes: 2

Views: 748

Answers (1)

Don Kirkby
Don Kirkby

Reputation: 56590

I would suggest you pass your results back to the root process, and let the root process write them to the SQLite database. The pseudocode would look something like this:

load list of data
if rank == 0:
    for _ in len(data):
        result = receive from any worker
        save result
else:
    select chunk of data
    process data
    send result(s) to rank 0

The advantage over gathering is that rank 0 can save the results as soon as they are ready. There is an mpi4py example that shows how to spread tasks out over multiple workers when there are lots of tasks and the processing time varies widely.

Upvotes: 1

Related Questions