Reputation: 105
I'm launching 256 threads in total. When I do it by launching a single block, everything works fine. But when I launch the threads in 2x2 blocks each with (8x8 threads), the kernel loops infinitely. Well, the real problem is that my kernel code waits for partial results from other blocks and after running several tests, I observed that the blocks were launched in a random order and they seem to be executed in a sequential order.
Do CUDA blocks run in parallel if they're launched from the same kernel? The GPU I'm using is not a limitation since I'm launching only 256 threads and a GTX 580 can handle them. (everything works fine in a single block launch of 16x16 threads) Is there a way I can know the order of execution or maybe specify it?
Upvotes: 0
Views: 4248