Reputation: 33864
In sycl we create a kernel like this:
queue.submit(
[&d_cells, &d_count_occupied](sycl::handler& cgh)
{
auto cells_accessor = d_cells.get_access<sycl::access_mode::read>(cgh);
auto count_accessor =
d_count_occupied.get_access<sycl::access_mode::write>(cgh);
cgh.parallel_for(
d_cells.range(),
[cells_accessor,
count_accessor](sycl::id<3> id, sycl::kernel_handler kh)
{
auto cell = cells_accessor.at(kh, id);
if (cell.is_occupied())
{
sycl::atomic_ref<
unsigned,
sycl::memory_order::relaxed,
sycl::memory_scope::device>
count{count_accessor[0]};
count++;
}
}
);
}
)
This kernel takes 2 buffers, 1 which holds cell information, the other is designed to count the number of "occupied" cells. Imagine now that i have the d_cells
buffer wrapped up into a class which has knowledge of the occupied cells or not. We could conceivably offer a smart function that takes a user supplied lambda to operate on the cells:
class Cell {
bool is_occupied() const;
int get_position() const;
// implementation details.
};
class Grid {
// Apply some user function to all of the occupied cells.
template <typename TFunctor, typename... TArgs>
sycl::event apply_all_occupied(sycl::queue q, TFunctor&& function, TArgs... args);
private:
sycl::buffer<Cell> d_cells;
};
The intended call pattern would be something like this:
sycl::buffer<unsigned> d_count_occupied{
count_occupied.data(), count_occupied.size()};
auto function = [](auto grid_cell, sycl::kernel_handler, auto count_accessor)
{
sycl::atomic_ref<
unsigned,
sycl::memory_order::relaxed,
sycl::memory_scope::device>
count{count_accessor[0]};
count++;
};
grid.apply_all_occupied(queue, function, d_count_occupied).wait_and_throw();
This would be very cool, it simplifies and abstracts the implementation of the "Grid" substantially which is nice. But here we have a problem. The implementation of the functor that the user gives must be able to run on the device. Therefore the buffers provided need to be converted to "accessors" before being passed to the user supplied function. We could maybe work it out with some meta programming like:
template <typename TFunctor, typename... TArgs>
sycl::event apply_all_occupied(sycl::queue q, TFunctor&& function, TArgs... args) {
queue.submit(
[this, function, &args...](sycl::handler& cgh)
{
auto cells_accessor = d_cells_.get_access<sycl::access_mode::write>(cgh);
// Somehow get the access to all of the arguments here?
std::tuple accessors = {args.get_access<sycl::access_mode::read>(cgh), ...};
cgh.parallel_for(
d_cells.range(),
[cells_accessor,
accessors, function](sycl::id<3> id, sycl::kernel_handler kh)
{
auto cell = cells_accessor.at(kh, id);
function(kh, cell, accessors);
}
);
}
But this has serious issues:
get_access
calls.Is there a sensible way to implement this type of behaviour?
Upvotes: 3
Views: 390
Reputation: 6123
Yes, there is a way. Your second requirement to customize the access mode means that you want to pass in a transform operation for each buffer into apply_all_occupied()
instead of the buffer itself. I.e. you receive a parameter pack BufferAccessFuncsT &&... get_access_funcs
, where each element is a callable. For example:
int main()
{
sycl::buffer d_count_occupied;
sycl::other_buffer other_buf;
sycl::queue q;
Grid grid;
auto function = [](auto grid_cell,
sycl::kernel_handler & kh,
sycl::buffer::accessor & count_accessor,
sycl::other_buffer::accessor & buf2) {
std::cout << "Called" << std::endl;
// Do stuff, e.g.:
// sycl::atomic_ref<...> count{count_accessor[0]};
// count++;
};
grid.apply_all_occupied(
q,
function,
[&d_count_occupied](sycl::handler & cgh) { return d_count_occupied.get_access<sycl::access_mode::write>(cgh); },
[&other_buf](sycl::handler & cgh) { return other_buf.get_access<sycl::access_mode::read>(cgh); });
}
In this example I pass in two lambdas into apply_all_occupied()
that return the accessor for each buffer. Of course, it also works with just one or zero or multiple accessors. The function
in main()
expects the accessors in the same order as the lambdas were passed into apply_all_occupied()
.
Regarding your first requirement, that the user defined function should not receive a tuple but rather the parameters directly, you basically want something like a "local parameter pack variable"
// Invalid, does not compile
auto &&... accessors = (std::forward<BufferAccessFuncsT>(get_access_funcs)(cgh))...;
that you can then forward to your actual function. As far as I know, something like this does not exist. But instead you can do the transform and pass the result directly to another helper function. Like so:
struct Grid
{
template <class FuncT, class... BufferAccessFuncsT>
void apply_all_occupied(sycl::queue & q, FuncT && func, BufferAccessFuncsT &&... get_access_funcs)
{
q.submit([&](sycl::handler & cgh) {
auto cells_accessor = 0; // Or whatever
// Helper function that receives the transformed arguments in the parameter pack get_access_funcs.
auto call_parallel_for_with_accessors = [&](auto &&... accessors) {
cgh.parallel_for([&](sycl::kernel_handler & kh) {
int grid_cell = cells_accessor; // Or whatever
func(grid_cell, kh, accessors...);
});
};
call_parallel_for_with_accessors((std::forward<BufferAccessFuncsT>(get_access_funcs)(cgh))...);
});
}
};
The call_parallel_for_with_accessors
in Grid::apply_all_occupied()
is the helper function that receives the accessors.
Note that I removed some of the non-essential stuff from your original code to get a minimal example.
Full example (live on godbolt):
#include <iostream>
#include <utility>
namespace sycl
{
struct kernel_handler
{
};
struct handler
{
kernel_handler kh;
template <class FuncT, class... ArgsT>
void parallel_for(FuncT && func, ArgsT &&... args)
{
func(kh, std::forward<ArgsT>(args)...);
}
};
enum class access_mode
{
read,
write
};
struct buffer
{
struct accessor
{
};
template <access_mode mode>
accessor get_access(handler &)
{
return accessor{};
}
};
// Just to have another buffer type.
struct other_buffer
{
struct accessor
{
};
template <access_mode mode>
accessor get_access(handler &)
{
return accessor{};
}
};
struct queue
{
handler cgh;
template <class FuncT>
void submit(FuncT func)
{
func(cgh);
}
};
} // namespace sycl
struct Grid
{
template <class FuncT, class... BufferAccessFuncsT>
void apply_all_occupied(sycl::queue & q, FuncT && func, BufferAccessFuncsT &&... get_access_funcs)
{
q.submit([&](sycl::handler & cgh) {
auto cells_accessor = 0; // Or whatever
// Helper function that receives the transformed arguments in the parameter pack get_access_funcs.
auto call_parallel_for_with_accessors = [&](auto &&... accessors) {
cgh.parallel_for([&](sycl::kernel_handler & kh) {
int grid_cell = cells_accessor; // Or whatever
func(grid_cell, kh, accessors...);
});
};
call_parallel_for_with_accessors((std::forward<BufferAccessFuncsT>(get_access_funcs)(cgh))...);
});
}
};
int main()
{
sycl::buffer d_count_occupied;
sycl::other_buffer other_buf;
sycl::queue q;
Grid grid;
auto function = [](auto grid_cell,
sycl::kernel_handler & kh,
sycl::buffer::accessor & count_accessor,
sycl::other_buffer::accessor & buf2) {
std::cout << "Called" << std::endl;
// Do stuff, e.g.:
// sycl::atomic_ref<...> count{count_accessor[0]};
// count++;
};
grid.apply_all_occupied(
q,
function,
[&d_count_occupied](sycl::handler & cgh) { return d_count_occupied.get_access<sycl::access_mode::write>(cgh); },
[&other_buf](sycl::handler & cgh) { return other_buf.get_access<sycl::access_mode::read>(cgh); });
}
EDIT: If the flexibility of passing a full lambda to apply_all_occupied()
is not required but only the access_mode
should be specified per buffer, you can introduce an additional helper function
template <sycl::access_mode mode, class BufferT>
auto AccessAs(BufferT & buffer)
{
return [&] (sycl::handler & cgh) {
return buffer.template get_access<mode>(cgh);
};
};
and call apply_all_occupied()
like this:
grid.apply_all_occupied(
q,
function,
AccessAs<sycl::access_mode::write>(d_count_occupied),
AccessAs<sycl::access_mode::read>(other_buf));
Full example on godbolt.
You can further abbreviate this by defining
template <class BufferT>
auto AsWritableBuffer(BufferT & buffer)
{
return AccessAs<sycl::access_mode::write>(buffer);
};
template <class BufferT>
auto AsReadableBuffer(BufferT & buffer)
{
return AccessAs<sycl::access_mode::read>(buffer);
};
and using it like this
grid.apply_all_occupied(
q,
function,
AsWritableBuffer(d_count_occupied),
AsReadableBuffer(other_buf));
Full example on godbolt.
Upvotes: 2