Reputation: 1002
I am using Windows 7 platform.
I describe below step-by-step all the routines that I perform to get the .dll file (PASS), dyn.load it in R (PASS) and evoking .Call function in R (FAIL).
When evoking .Call I get:
> out<- .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m))
**Error in .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m)) :
C symbol name "rowAND" not in load table**
1) Below the source code:
#include <stdio.h>
#include <math.h>
#include <cuda_runtime.h>
#include <cuda.h>
#include <device_launch_parameters.h>
#include <R.h>
#include <Rdefines.h>
#include "cuPrintf.cuh"
#include "cuPrintf.cu"
#include "cuRow.h"
#include "cuError.h"
extern "C" {
SEXP rowAND(SEXP x, SEXP r_nrow, SEXP r_ncol) {
// input:
// x=as.integer(t(m)), vector of integer values from R (t(m) because store values by col)
// r_nrow=nrow(m), scalar
// r_ncol=ncol(m), scalar
//x = coerceVector(x, INTSXP); // force coercion to a matrix of real values
// define deimension
int nrow = asInteger(r_nrow);
int ncol = asInteger(r_ncol);
size_t m_size;
size_t calc_size;
m_size = nrow * ncol * sizeof(int); // m (input)
calc_size = nrow * sizeof(int); // change to nrow/ncol depending on calculation (output)
// R
SEXP r;
PROTECT(r = allocMatrix(INTSXP,nrow,1));
// cuda error variable
cudaError_t err;
// allocate HOST
int *h_m = INTEGER(x);
int *h_calc = INTEGER(r);
// allocate DEVICE
int *d_m = NULL, *d_calc = NULL;
err = cudaMalloc((void **)&d_m, m_size); checkError(err);
err = cudaMalloc((void **)&d_calc, calc_size); checkError(err);
// copy host matrix to device
err = cudaMemcpy(d_m, h_m, m_size, cudaMemcpyHostToDevice); checkError(err);
// Initialize cuPrintf -- DEBUGGING
cudaPrintfInit();
dim3 numBlocks(nrow,1,1); // blocks
dim3 threadsPerBlock(1,1,1); // 1 thread per block
rowOR<<<numBlocks, threadsPerBlock,0,0>>>(d_m, d_calc, ncol); // main call
// Terminate cuPrintf -- DEBUGGING
cudaPrintfDisplay (stdout, true);
cudaPrintfEnd ();
err = cudaGetLastError(); checkError(err);
// Copy the device result vector in device memory to the host result vector
err = cudaMemcpy(h_calc, d_calc, calc_size, cudaMemcpyDeviceToHost); checkError(err);
// Free device global memory
err = cudaFree(d_m); checkError(err);
err = cudaFree(d_calc); checkError(err);
// Reset the device
err = cudaDeviceReset();
UNPROTECT(1);
return r;
}
2) I compile .cu file, using nvcc which generates the object (.obj). Thus, I link the libraries (PASS), no problem here, and it generates .dll file.
3) when I load the .dll using the R command: dyn.load IT PASS.
The loaded .dll appears in getLoadedDLLs()
:
> getLoadedDLLs()
Filename Dynamic.Lookup
base base FALSE
methods C:/Revolution/R-Community-6.2/R-2.15.3/library/methods/libs/i386/methods.dll FALSE
Revobase C:/Revolution/R-Community-6.2/R-2.15.3/library/Revobase/libs/i386/Revobase.dll TRUE
tools C:/Revolution/R-Community-6.2/R-2.15.3/library/tools/libs/i386/tools.dll FALSE
grDevices C:/Revolution/R-Community-6.2/R-2.15.3/library/grDevices/libs/i386/grDevices.dll FALSE
stats C:/Revolution/R-Community-6.2/R-2.15.3/library/stats/libs/i386/stats.dll FALSE
cuRow C:/Users/msn/Documents/Visual Studio 2010/Projects/R_C/R_C/Debug/cuRow.dll TRUE
4) HERE COMES THE PROBLEM: When I check if the function rowAND is loaded I get FALSE:
> is.loaded("rowAND")
[1] FALSE
>
Thus, obviously, it fails when I run .Call (because it is not loaded):
> path.dll<-'C:/Users/msn/Documents/Visual Studio 2010/Projects/R_C/R_C/Debug'
> dyn.load(file.path(path.dll,paste0("cuRow", .Platform$dynlib.ext)))
> nrow<-10
> ncol<-3
> m<-matrix(sample(c(0,1),nrow*ncol,replace=TRUE),nrow,ncol)
> out<- .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m))
Error in .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m)) :
C symbol name "rowAND" not in load table
I see that the function appears to be correctly defined in the source code, but it can't be "seen" in the loaded library.
What I am missing here? Thanks in advance!
EDIT:
Based on @Dirk partial answer, will try to write a CUDA dll project which will be called by C. Thus, I can compile the target C source using standard R CMD SHLIB.
like: C (dll), deployed to R which calls CUDA dll inside.
will update when done!
EDIT 2:
I answered my own question below. I finally could get CUDA
implementation in R
(WINDOWS platform
)
Upvotes: 3
Views: 2219
Reputation: 2131
There are several important things when compile R with CUDA by Visual Studio on Windows.
Declare the C function with __declspec(dllexport) keyword (install of extern "C" )
extern “C” __declspec(dllexport)
Build the same version with R (32- or 64-bits); Otherwise, loading DLL in R will fail by:
Load Library failure: %1 is not a valid Win32 application.
Solution Explorer → Project name
Properties → Linker → Input → Additional Dependencies
Other detail steps, you can refer NVIDIA blog and ParallelR.
Upvotes: 1
Reputation: 1002
I decided to post an answer to my own question, for those who are experiencing the same difficulties. I can categorize the answer as a workaround to the problem.
End of day, my problem was to implement CUDA
GPU
parallelism in R
using WINDOWS
platform.
I see that the majority of CRAN
packages (not to say all) implementing CUDA
have NO binaries for WINDOWS platform. In other words, if you try to build from source in WINDOWS
it fails. I guess they haven't been built for WINDOWS because it is trick to compile and link .cu files in WINDOWS
using MinGW
and nvcc
compiler together.
NVidia has VS2010 as the main platform for WINDOWS
development, and eclipse plug-in is only supported for Linux. Although, nvcc compiler supports -ccbin
option which can make it call gcc
, to configure the "toolchain" is really trick.
My workaround was to develop a DLL project in VS2010, and to compile and link the DLL using VS2010
native compiler/linker which is cl
.
This dll is the piece that internally calls the CUDA GPU
parallelism.
After compiled in VS2010
, I loaded the dll using dyn.load()
and called its functions using .C
in R
.
It finally worked, and end of day I could deploy CUDA
GPU
parallelism functionality to R
in a WINDOWS
platform.
I could deploy the same .dll in a package, using NAMESPACE, and provide the dll source code inside the CRAN tar ball, aiming not to infringe open source policies. Anyways, it is a workaround.
Two important factors:
1) To deploy all exported functions in native C, using extern "C"
.
2) To consider all input variables of the functions as pointers, since it is mandatory when using '.C' calls.
Upvotes: 4
Reputation: 368241
What I would do in your case is to look very closely at the existing R package for CUDA which are publicly available on CRAN as they provide working implementations. I believe at least some of these build on Windows too.
Among the CRAN packages using CUDA are
and more. See the CRAN Task View on High-Performance Computing for more.
I am most familiar with the first (and oldest) one. I uses one layer of code to call from R to C, and then another to call from C to the CUDA-enabled code compiled with NVidia's compiler frontend. The last one uses Rcpp for the passage from R to C/C++. I suspect your error is due to trying to skip one step.
Upvotes: 2