Nitish Upreti
Nitish Upreti

Reputation: 6550

Best way to distribute database read jobs among Java threads

I have a MySQL database with a large number of rows.

I want to initialize multiple Threads (each with its own database connection) in Java and read/print the data simultaneously.

How to partition data between multiple threads so as no two Threads read the same record? What strategies can be used?

Upvotes: 2

Views: 2117

Answers (3)

Santosh
Santosh

Reputation: 17923

If the large dataset has an integer primary key, then one of the approaches would be as follows

  • Get the count of rows using the same select query.
  • Divide the entire dataset into equal number of partitions
  • Assign each partition to each thead. Each thread will have its own select query with primary key value range as constraint.

Note: the following issues with this approach

  1. You (fire number of threads + 1) queries to database. So performance might be a problem.
  2. All the partitions may not be equal (as there will be some ids which are deleted).

This approach is simple and makes sure that a row is strictly processed by only thread.

Upvotes: 1

Antoniossss
Antoniossss

Reputation: 32537

It depends on what kind of work are your threads going to do. For example i usually execute single SELECT for some kind of large dataset, add tasks to thread safe task queue and submit workers which picks up proper task from queue to process. I usually write to DB without synchronisation, but that depends on size of unit of work, and DB constrains (like unique keys etc). Works like charm. Other method would be to just simply run multiple threads and let them work on their own. I strongly disadvice usage of some fancy LIMIT, OFFSET however. It still requires DB to fetch MORE data rows than it will actually return from query.

EDIT: As you have added comment that you have same data, than yes, my solution is what are you looking for

  1. Get dataset by single query
  2. Add data to queue
  3. Lunch your threads (by executors or new threads)
  4. Pick data from queue and process it.

Upvotes: 2

SANN3
SANN3

Reputation: 10099

You can use a singleton class to maintain already read rows. So every thread can access the row number from that singleton.

Otherwise you can use static AtomicInteger variable from a common class. Every time threads will call getAndIncrement method. So you can partition data between the threads.

Upvotes: 0

Related Questions