Reputation: 91
I was trying to run a code that is based off the following link
https://documen.tician.de/pycuda/tutorial.html
Running code in this link turned out to be fine.
This is my version with similar definitions. Note that I was running under engine context since I want to run an engine.execute function.
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import numpy as np
from keras.datasets import mnist
dims = (1, 28, 28)
dims2 = (1, 1, 10)
batch_size = 1000
nbytes = batch_size * trt.volume(dims) * np.dtype(np.float32).itemsize
nbytes2 = batch_size * trt.volume(dims2) * np.dtype(np.float32).itemsize
self.d_src = cuda.mem_alloc(nbytes)
self.d_dst = cuda.mem_alloc(nbytes2)
bindings = [int(self.d_src), int(self.d_dst)]
(x_train, y_train), (x_test, y_test) = mnist.load_data()
img_h = x_test.shape[1]
img_w = x_test.shape[2]
x_test = x_test.reshape(x_test.shape[0], 1, img_h, img_w)
x_test = x_test.astype('float32')
x_test /= 255
num_test = x_test.shape[0]
output_size = batch_size * trt.volume(dims2)
y = np.empty((num_test,output_size), np.float32)
for i in range(0, num_test, batch_size):
x_part = x_test[i : i + batch_size]
y_part = y[i : i + batch_size]
cuda.memcpy_htod(self.d_src, x_part)
cuda.memcpy_dtoh(y_part, self.d_dst)
However it failed at the memcpydtoh, yet memcpyhtod worked.
File "a.py", line 164, in infer
cuda.memcpy_dtoh(y_part, self.d_dst)
pycuda._driver.LogicError: cuMemcpyDtoH failed: invalid argument
Why is this the case? The definitions are similar to the code in the link.
Upvotes: 4
Views: 3237
Reputation: 91
I have solved it anyway.
The device allocation needs to be different for x_part
and y_part
since their sizes are different.
So it works if I define output_size = trt.volume(dims2)
.
The error message isn't very helpful to begin with & made me think I inputted wrong arguments instead.
Upvotes: 3