macman
macman

Reputation: 91

Question about pycuda._driver.LogicError: cuMemcpyDtoH failed: invalid argument

I was trying to run a code that is based off the following link

https://documen.tician.de/pycuda/tutorial.html

Running code in this link turned out to be fine.

This is my version with similar definitions. Note that I was running under engine context since I want to run an engine.execute function.

import pycuda.driver as cuda 
import pycuda.autoinit 
import tensorrt as trt 

import numpy as np
from keras.datasets import mnist 

dims = (1, 28, 28) 
dims2 = (1, 1, 10) 
batch_size = 1000 

nbytes = batch_size * trt.volume(dims) * np.dtype(np.float32).itemsize 
nbytes2 = batch_size * trt.volume(dims2) * np.dtype(np.float32).itemsize 

self.d_src  = cuda.mem_alloc(nbytes) 
self.d_dst = cuda.mem_alloc(nbytes2) 

bindings = [int(self.d_src), int(self.d_dst)] 

(x_train, y_train), (x_test, y_test) = mnist.load_data()

img_h = x_test.shape[1]
img_w = x_test.shape[2]

x_test = x_test.reshape(x_test.shape[0], 1, img_h, img_w)

x_test = x_test.astype('float32')
x_test /= 255
num_test = x_test.shape[0]

output_size = batch_size * trt.volume(dims2)

y = np.empty((num_test,output_size), np.float32)

for i in range(0, num_test, batch_size): 
     x_part = x_test[i : i + batch_size] 
     y_part = y[i : i + batch_size] 
     cuda.memcpy_htod(self.d_src, x_part) 

     cuda.memcpy_dtoh(y_part, self.d_dst) 

However it failed at the memcpydtoh, yet memcpyhtod worked.

File "a.py", line 164, in infer
    cuda.memcpy_dtoh(y_part, self.d_dst)
pycuda._driver.LogicError: cuMemcpyDtoH failed: invalid argument

Why is this the case? The definitions are similar to the code in the link.

Upvotes: 4

Views: 3237

Answers (1)

macman
macman

Reputation: 91

I have solved it anyway.

The device allocation needs to be different for x_part and y_part since their sizes are different.

So it works if I define output_size = trt.volume(dims2).

The error message isn't very helpful to begin with & made me think I inputted wrong arguments instead.

Upvotes: 3

Related Questions