Reputation: 6474
I have a java program which goes to some websites, converts the website's HTML into XML, then runs some xquery commands on the XML, finally stores the result into csv, which is then uploaded into Cloud file storage (like Amazon S3).
Now, I want to split the work into multiple threads so that it is done faster-- but how do I determine the number of threads that is optimum for my work?
I want to determine the number of threads that I should allow, for the different types of Amazon EC2 instances... Is there a library or framework that can help me with this?
Or, do I have to manually run the code on an Amazon EC2 instance, and keep changing the number of threads, and measure the time taken?
Specifically, I want to keep a balance between total time taken to process all threads, versus the number of threads that are allowed to run simultaneously... And if I could clearly see this correlation for different servers with different CPU/RAM capacities that would be great...Any advice/guidance would be appreciated...
Upvotes: 2
Views: 4065
Reputation: 30022
To find the number of logical cores available you can use:
int processors = Runtime.getRuntime().availableProcessors();
and create a ThreadPool
with that many. See also :
Finding Number of Cores in Java
Java: How to scale threads according to cpu cores?
Upvotes: 1
Reputation: 66876
The type of work you describe is almost certainly I/O bound -- most of the time is spent waiting for data to be downloaded or uploaded. If so, your goal is simply to make full use of upload / download bandwidth.
If so, the optimal number of threads will be more than the number of physical cores on the machine (which would be the right place to start for a CPU-bound process).
It's hard to say from this info what the optimum number of threads will be as it depends on how much you're downloading and how fast the link is. Try doubling the number of threads until performance starts to suffer.
Upvotes: 4
Reputation: 55856
I think you should profile your app with single thread using JHAT, MAT, etc... and then decide how many thread based on machine config you want to run. It will give you a general idea of how expensive your thread is. You can then run load test (like 10,000 items queued up against 10 threads) to validate the limits that you came up with, and tune accordingly.
Upvotes: 2