Reputation: 116

Concurrency, how to create an efficient actor setup?

Alright so I have never done intense concurrent operations like this before, theres three main parts to this algorithm.

This all starts with a Vector of around 1 Million items. Each item gets processed in 3 main stages.

Task 1: Make an HTTP Request, Convert received data into a map of around 50 entries. Task 2: Receive the map and do some computations to generate a class instance based off the info found in the map. Task 3: Receive the class and generate/add to multiple output files.

I initially started out by concurrently running task 1 with 64K entries across 64 threads (1024 entries per thread.). Generating threads in a for loop.

This worked well and was relatively fast, but I keep hearing about actors and how they are heaps better than basic Java threads/Thread pools. I've created a few actors etc. But don't know where to go from here.

Basically: 1. Are actors the right way to achieve fast concurrency for this specific set of tasks. Or is there another way I should go about it. 2. How do you know how many threads/actors are too many, specifically in task one, how do you know what the limit is on number of simultaneous connections is (Im on mac). Is there a golden rue to follow? How many threads vs how large per thread pool? And the actor equivalents? 3. Is there any code I can look at that implements actors for a similar fashion? All the code Im seeing is either getting an actor to print hello world, or super complex stuff.

Upvotes: 0

Answers (3)

lmm

Reputation: 17431

1) It sounds like most of your steps aren't stateful, in which case actors add complication for no real benefit. If you need to coordinate multiple tasks in a mutable way (e.g. for generating the output files) then actors are a good fit for that piece. But the HTTP fetches should probably just be calls to some nonblocking HTTP library (e.g. spray-client - which will in fact use actors "under the hood", but in a way that doesn't expose the statefulness to you).

2) With blocking threads you pretty much have to experiment and see how many you can run without consuming too many resources. Worry about how many simultaneous connections the remote system can handle rather than hitting any "connection limits" on your own machine (it's possible you'll hit the file descriptor limit but if so best practice is just to increase it). Once you figure that out, there's no value in having more threads than the number of simultaneous connections you want to make.

As others have said, with nonblocking everything you should probably just have a number of threads similar to the number of CPU cores (I've also heard "2x number of CPUs + 1", on the grounds that that ensures there will always be a thread available whenever a CPU is idle).

With actors I wouldn't worry about having too many. They're very lightweight.

Upvotes: 1

almendar

Reputation: 1813

If you have really no expierience in Akka try to start with something simple like doing a one-to-one actor-thread rewriting of your code. This will be easier to grasp how things work in akka.

Spin two actors at the begining one for receiving requests and one for writting to the output file. Then when request is received create an actor in request-receiver actor that will do the computation and send the result to the writting actor.

Upvotes: 0

vptheron

Reputation: 7476

1) Actors are a good choice to design complex interactions between components since they resemble "real life" a lot. You can see them as different people sending each other requests, it is very natural to model interactions. However, they are most powerful when you want to manage changing state in your application, which does not seem to be the case for you. You can achieve fast concurrency without actors. Up to you.

2) If none of your operations is blocking the best rule is amount of threads = amount of CPUs. If you use a non blocking HTTP client, and NIO when writing your output files then you should be fully non-blocking on IOs and can just safely set the thread count for your app to the CPU count on your machine.

3) The documentation on http://akka.io is very very good and comprehensive. If you have no clue how to use the actor model I would recommend getting a book - not necessarily about Akka.

Upvotes: 2

Concurrency, how to create an efficient actor setup?

Answers (3)

Related Questions