Reputation: 4470
Given a synchronous program (Minecraft) that needs to process x lighting updates when the world is generating. A technique to increasing performance, is to patch Minecraft to instead queue the lighting updates, then process them asynchronously, removing any duplicate scheduled lighting updates from the queue.
This is to provide some duplication removal, as well as spreading out the load of world generation over the next few tick cycles.
This queue is currently unbounded, and before the duplicate location check, would be the source of poor performance or memory leaks as the queue grew to ridiculous sizes.
What can you do to solve the memory leak in the case that the queue is growing faster then it's processing? Are there techniques to force a thread to get more time, or block until the queue is under a certain length? We can't provide back pressure, as that would cause lost lighting updates that wouldn't necessarily be rescheduled. Also potentially having a max queued time before promising processing could be ideal.
I would have provided code, / asked in more abstract terms, But I'm not sure of the specifics.
Upvotes: 1
Views: 79
Reputation: 8414
You cannot slow down a thread as such, other than by putting sleep statements into it. You can play with thread priorities, but that simply changes the order in which threads are scheduled. This may have the effect of slowing down a thread, but only if there are higher priority threads taking all the run time.
What you need is some sort of work quantity regulator, not a work rate regulator. That is, attempt to do only as much work as can be performed by the underlying machine. You may want to look at ZeroMQ: this would be a big architectural change (asynchronous for a start!), but stick with me for a moment.
Programming with ZeroMQ is Actor model; threads communicate down sockets, which act like message queues. The beauty is that there's way more to these sockets than simply a point-to-point link; there's patterns too. For example, you might have 8 threads doing lighting. You would send lighting requests to those through a single PUSH/PULL socket; the one that is available will pick up that message and process it. You would also configure that socket so that if the high water mark is reached (lighting threads aren't keeping up) new lighting request messages supplant old ones in the queue (the lighting threads process only as many as they can actually cope with, biased towards the newest ones).
And the sockets can go over network connections too. You'd be able to distribute it over several computers with hardly any code changes.
There are problems with this suggestion of mine: you'd be serialising and deserialising objects an awful lot. You'd be tearing up the existing architecture. But you would be leveraging a comms framework that distributes, scales, and already does load balancing and queue management in a useful way. It would be fun to try!
EDIT in response to comment
One thing I can think of is to add a submission time-of-day field to a lighting request, and have the lighting thread(s) calculate the time between initial submission of each request and when they complete the processing. This execution time measurement could be used to update an average time-to-completion value across all lighting threads (put it in some shared memory guarded by a mutex, or something like that).
The thing submitting the lighting requests could then refer to this average, and if it's increasing, it should ease up on the rate. Similarly if the average processing time is decreasing, it could up the rate. If the computer itself started getting busy doing something else in the background, this would then adapt to that shortage of execution time.
Effectively this is building profiling of the lighting processing into the application, feeding that back to the submitter, and learning during execution just how many lighting requests per second can be submitted without falling too far behind.
This would have the side effect of ensuring that the queue itself doesn't get too full and grow indefinitely. It's not bounding the queue to any particular length, but it is limiting how much time there is between the front and back of the queue. So on a faster machine, the queue will on average be longer, and shorter on a slower machine.
Thermal Management in CPUs
Of course, the definition of a "fast" and "slow" computer is a bit complicated these days. A modern Intel / AMD CPU (and lots of ARMs too) will ramp up its clock rate if it determines that there's a lot of runtime required. So the lighting request rate that can be sustained is going to increase as the CPU works out that Turbo mode is required, ramps up the clock rate, things run faster. So by adapting to the rate that can be sustained whilst meeting a time target (i.e. keeping the average completion time reasonable), your code will make use of the increased speed of the CPU as it gets used to the workload you're asking it to do.
You will need to be a bit careful though; changing clock speed is not instantaneous - I've seen 300ms delays during which literally nothing happens anywhere in entire machine - which when they happen will look like everything has suddenly taken 300ms longer. That's why the "average" completion time is required - it'll smooth out pertebation in the measurement.
Getting the length of the average right will matter too, otherwise you'll get into a condition where your code's demands on the CPU and the CPU's clock rate will oscilate:
Also you will want to be ramping up the workload even if the average completion time has reached stability. If could be that doing so pushes the CPU to up its clock rate, and more lighting requests can be done without impacting on the completion time.
Basically you want to be increasing the workload, and only reducing it if the latency actually increases, but doing so over a timescale slower than the CPU's own thermal management / clock management cycles.
Upvotes: 2
Reputation: 140427
What can you do to solve the memory leak in the case that the queue is growing faster then it's processing?
In that case, increasing memory footprints are probably your smallest problem.
Are there techniques to force a thread to get more time, or block until the queue is under a certain length?
Java threads can be prioritized, but it very much depends on the underlying operating system if leads to observable effects.
It is hard to help beyond the above - you see, as this depends very much on the details of the architecture/implementation. I would be looking into questions such as:
You see - it seems that you can't influence the number of events flowing into your system. So you simply have to design a solution that works for that flow. This is a specific problem that requires that you sit down and profile your threads for example: to understand where and how they spend their time. And the conclusion might be for example that the machine doing these computations hasn't enough horse power. But is something that you have to do.
Upvotes: 1