Reputation: 301
I have a application in Java in which I need to use multi-threading.
I have a list of ID'
s which is primary key for tables stored in DynamoDB.
Say, the list is :
| ID_1 | ID_2 | ID_3 | ID_4|.......| ID_n|
Now I want multiple threads to read these ID's
and do the following for each ID:
Each thread should take a ID and query DynamoDB tables (there are two dynamo DB tables for which ID is the primary key)
The result of querying each Dynamo DB table should be stored in a separate file.
Essentially, Thread_1
should pick up a ID say ID_1
, it should query DynamoDB tables DDB_1
and DDB_2
. The result of querying DDB_1
should go in File1
and result of DDB_2
should go in File_2
. This needs to be done for all the threads. Finally, when all threads have completed execution I should have two files File_1
and File_2
containing results of query from all the threads.
I have come up with a solution that let all producer threads (threads which get the query results from Dynamo DB) queue the results of the query to a single consumer thread which writes to a file say File_1
. Similarly all producer threads write to a second queue and a second consumer thread writes to File_2
.
Do you feel any flaw in the approach above? Is there a better way to apply multi-threading in this case?
Upvotes: 2
Views: 2275
Reputation: 576
This is what you want to achieve:-
ID_1 -> Thread1 -> Query DB1 -> ConsumerSingleton -> Write data to File 1
-> Query DB2 -> ConsumerSingleton -> Write data to File 2
ID_2 -> Thread2 -> Query DB1 -> ConsumerSingleton -> Write data to File 1
-> Query DB2 -> ConsumerSingleton -> Write data to File 2
ID_3 -> Thread3 -> Query DB1 -> ConsumerSingleton -> Write data to File 1
-> Query DB2 -> ConsumerSingleton -> Write data to File 2
..
..
ID_N -> ThreadN -> Query DB1 -> ConsumerSingleton -> Write data to File 1
-> Query DB2 -> ConsumerSingleton -> Write data to File 2
Since you are using single consumer object you don't have to take care of synchronize write operation of file1 & file2. However you have to synchronize the operation/method where your threads will be dumping the result to consumer's collection. You can use ConcurrentHashMap to collect the results from different threads in your consumer class which is thread safe.
Also, since you are going to read rows from DB1 and DB2 based on unique id's row level lock should not happen while multiple thread tries to access. If this is not the case and 2 thread tries to read row with same ID contention can happen.
Upvotes: 1
Reputation: 192
If i understand right, you want 2 Threads that each query a db-table and post the results in a file. See under.
APPLICATION
|
|-->THREAD --> DB_1 --> file1
|
|-->THREAD --> DB_2 --> file2
First off this should be perfectly fine, you are not reading and writing to/from the same data, meaning this is threadsafe. The way you want to do this is making a class for each Thread(just an example). Do this by extending runnable. Then place all the code for connection to a DB in the run method. Long example: http://www.tutorialspoint.com/java/java_multithreading.htm
class Thread1 implements Runnable {
public void run() {
Connect/write
}
}
Call by using
Thread1 t = new Thread1();
t.start();
This should work fine as long as you are not editing the ID's while you are reading them in one of these Threads.
This locks a method to a single Thread, for example when writing to the same file this is necessary as the Threads will interupt each other.
public synchronized void write(text, file1, file2){
}
Call this like a normal method in your Threads. This does NOT guarantee the order in which the Threads access these methods, in this example it's first come first serve.
Upvotes: 1
Reputation: 718678
Do you feel any flaw in the approach above?
I can't spot one. But of course, I can only comment based on your high-level description of your algorithm. There will be right and wrong ways to implement it.
Is there a better way to apply multi-threading in this case?
It is hard to say. But I can't think of any alternative that is obviouly better. There are (no doubt) alternatives, but the only way you could objectively determine which is best1 would be to implement various alternatives and benchmark them.
Note that the bottlenecks for this application are likely to be:
(Probably, the former will dominate.) Since both are going to be limited by "external" factors (e.g. disc I/O, networking, load on the database CPUs) you will most likely need to "tune" the number of worker threads you use.
1 - I assume you mean the one that has the best throughput.
Upvotes: 0