Bulva
Bulva

Reputation: 1248

Sharing cx_Oracle cursor between threads or processes

I have a database in Oracle. I need to export some data, process them and save the data into the file. In the table is oracle sdo_geom and I need convert the geometry into the WKT format. The geometry is really big, a big polygons with thousand of vertices. I am using SDO_UTIL.TO_WKTGEOMETRY() function but it takes too long. Selecting (converting the geometry) into the database is a bottleneck in this case.

I think about multihreading or multiprocessing in Python. The scenario should looks something like this:

  1. Create connection to the database
  2. Create cx_Oracle.cursor
  3. Then start multiple threads or processes sharing the cursor
  4. In every thread (or process) I will select the data from the database (the same table) by chunks
  5. Then the all the data will be send to the thread or process responsible for processing the data and saving to the file

I am selecting the data using fetchmany() via chunks:

def get_row_chunks(self):
    while True:
        rows = self.cursor.fetchmany()
        if not rows:
            break
        yield rows

So my question is, is it possible to solve this problem using multithreading or multiprocessing and how (using cx_Oracle) or psycopg2 for PostgreSQL I think it doesn't matter. The cursor has the same behaviour in these libraries I think. Or is it not possible easily share cursor between threads or processes.

Possible solution will be pregenerate the WKT geometry in the database but this looks to me more general solution.

Upvotes: 0

Views: 1712

Answers (1)

Anthony Tuininga
Anthony Tuininga

Reputation: 7096

You cannot perform more than one action on a cursor or connection in cx_Oracle simultaneously. So you can't do simultaneous fetches, for example. If you attempt to do so you will discover that the fetches block. As such, I would suggest that you simply perform the fetch in one thread and pass the data fetched off to another thread or threads for processing.

The other possibility is to create a pool which permits you to create multiple connections and use them in multiple threads (use the threaded=True parameter when you create the pool). You can then use each connection to query a different portion of the data.

Upvotes: 2

Related Questions