CUDA: how to represent efficiently 2-D arrays on the GPU

Question

I need to process a 2-D array with dimensions K x N on the GPU, where K is a small number (3, 4, or 5) and N has a value of millions to 100s of millions. The processing will be done for one column of K elements at a time, such that each column will be processed by a separate invocation of a kernel. What is the most efficient way to represent the K x N array on the GPU:

1) in a 1-D array, placing the K elements of a column in consecutive locations, so that each thread will process elements K*thread_id, K*thread_id + 1, ..., K*thread_id + K - 1

2) as K separate 1-D arrays, where each array stores 1 row of the original array;

3) something else

Thank you!

CUDA: how to represent efficiently 2-D arrays on the GPU

Answers (1)

Related Questions