Reputation: 35
I'm passing a large 2D array (in C) to the device and determining all possible combinations. For example:
A =
id val1 val2
1 100 200
2 400 800
Combination =
id1 id2 sumval1 sumval2
1 2 500 1000
Because of the size of the original array, storing and returning all possible combinations would not be possible. I would like to return all combinations where sumval1 > 500 and sumval2 > 1000.
How can I return just this subset of combinations to the host to be written to a file; given that I won't know how many combinations meet the conditions?
Upvotes: 1
Views: 1182
Reputation: 152279
Some possible approaches:
malloc
. At the completion of your combination-creation, collect all the individual combinations into a single buffer created with malloc
. Then pass the total size of this buffer, and the pointer to this buffer, back to the host. The host then allocates a new buffer of that size using cudaMalloc
, and launches a kernel to copy the data from the buffer created with malloc
to the buffer created with cudaMalloc
. At the completion of this copy-kernel, the host can transfer the data back to the host from the buffer created with cudaMalloc
.I would suggest that 1 is probably the best approach without knowing anything else about what you are trying to do. In kernel malloc
is not particularly fast when allocating large numbers of small allocations. Also, when using in-kernel malloc
, note the default size limitation (8MB) which can be increased.
Upvotes: 3
Reputation: 7638
You can page the results:
Create a fix result array (let's say Z items).
Return not only the results but the point where you stopped (last_id1, last_id2).
On the next call pass a new starting point (start_id1, start_id2) based on your last result.
You can use streams in order to keep the GPU loaded.
Based on this, you could even distribute the calculation using several GPUs.
Upvotes: 0