Reputation: 2691
For example, I have three tasks A, B, and C. Among them B and C depends on A. And there are sufficent CU's to run B and C at the same time. And then I enqueue A and C on queue0, and B on queue1. And there is a huge delay after A is finished and before B is started, which make the whole job taking longer time than using only one queue.
Is this normal? Or could I have done something wrong?
I will write a sample code if required, the original code is heavily encapsuled. But actually I just create an event when enqueuing A and pass it to the enqueuing of B, and both queues are just normal in order queue. Nothing seems to be special.
Upvotes: 2
Views: 373
Reputation: 11910
I couldn't find info about latencies but, to call something normal, we need statistically derived latency base for all platforms, here is mine:
HD7870 and R7-240 showing same behaviour. Windows 10. Two channel RAM. OpenCl 1.2(64 bit build). CodeXL profiling. All in-order queues. Some old drivers before crimson.
There were background processes: avira, google chrome,.. which are advanced enough to use GPU for their purpose and may hinder kernel executions.
My solution to these were pipelining through usage of many independent queues to hide their event latencies and worked like a charm. R7-240 was running on 16-queues fine. It has only 2 ACE units so newer cards having 4-8 of them could work with more queues.
What I didn't try and wonder is: N queue waiting for completion M other queues with event list performance. Maybe tree-like waiting structure could be better for many queues if they lag too much.
Upvotes: 1