Reputation: 31
if i want to use GPU to do some parallel computing on a network (by C++ AMP is better), how can i use the network data in the memory of GPU?
how can i copy the adjacency list to the memory of GPU to use?
adjacency matrix will be too large for a large and sparse network, so i don't want to use an adjacency matrix.
Upvotes: 3
Views: 241
Reputation: 13723
Here's the simplest useful example I could find. Its an unoptimized example of a matrix multiply using C++AMP.
Couple of key points:
This code is unoptimized. Look at the examples in the Chapter4 folder of the C++ AMP Book Codeplex project for optimized examples and the book for a discussion of why there were written that way. There are also examples on MSDN as suggested by Dan H.
Copies to and from the GPU are minimized by declaring input array_view
as const
, to prevent copy out, and calling discard_data
on output array_view
to prevent copy in.
The example explicitly calls array_view::synchronize()
to guarantee that the result has been copied back into CPU memory. This isn't strictly required as an implicit copy would occur when the array_view data was accessed. For example by reading an element c[i]
.
C++ AMP queues work to GPU. So work executes on the GPU asynchronously. It is only guaranteed to have completed when results are accessed on the CPU or an explicit synchronization call is made. In this regard is behaves similarly to a std::future
.
Here's the code:
void MatrixMultiply(std::vector<float>& vC,
const std::vector<float>& vA,
const std::vector<float>& vB, int M, int N, int W)
{
// Create read-only wrappers to the input data.
array_view<const float,2> a(M, W, vA);
array_view<const float,2> b(W, N, vB);
// Create a write-only wrapper to the output data.
array_view<float,2> c(M, N, vC);
c.discard_data();
// Declare a kernel to use one GPU thread per matrix element.
parallel_for_each(c.extent, [=](index<2> idx) restrict(amp)
{
int row = idx[0];
int col = idx[1];
float sum = 0.0f;
for(int i = 0; i < W; i++)
sum += a(row, i) * b(i, col);
c[idx] = sum;
});
// Force a synchronization of the result array_view data onto the CPU.
c.synchronize();
}
Upvotes: 0
Reputation: 706
If you have the data within the CPU (normal C++ code) you have to copy it to the GPU using C++ amp methods. C++ AMP Overview is a good place to learn the basics.
If it is a simple array or vector this involves wrapping the data into an array_view object and then performing operations on the data using methods marked with restrict(amp).
Upvotes: 2