land cc
land cc

Reputation: 15

cupy execution error in multiple array calculation

numpy loop is ok.
cupy loop 1 time, 3 times is ok. but 10 times makes error.
how can i fix this problem?
is this gpu memory problem?

(source code)

import cupy as cp  
import numpy as np  

mc = 5000  
def fcal(ff, nloop, skey):  
    maa = ff.zeros((mc,mc)) + 0.0  
    mbb = ff.zeros((mc,mc)) + 0.0  
    for jj in range(nloop): maa = ff.dot(maa, mbb)  
    asum = ff.sum(maa)  
    print("[fcal] (%s) nloop=[%2d] asum=[%s]" % (skey, nloop, asum))  

fcal(np,  1, "np")  
fcal(np,  3, "np")  
fcal(np, 10, "np")  
fcal(cp,  1, "cp")  
fcal(cp,  3, "cp")  
fcal(cp, 10, "cp")  

(execution result)

[fcal] (np) nloop=[ 1] asum=[0.0]  
[fcal] (np) nloop=[ 3] asum=[0.0]  
[fcal] (np) nloop=[10] asum=[0.0]  
[fcal] (cp) nloop=[ 1] asum=[0.0]  
[fcal] (cp) nloop=[ 3] asum=[0.0]  
Traceback (most recent call last):  
  File "C:\testdir\2cupy_test.py", line 30, in <module>  
    fcal(cp, 10, "cp")  
  File "C:\testdir\2cupy_test.py", line 22, in fcal  
    print("[fcal] (%s) nloop=[%2d] asum=[%s]" % (skey, nloop, asum))  
  File "cupy\core\core.pyx", line 1596, in cupy.core.core.ndarray.__str__  
  File "cupy\core\core.pyx", line 1643, in cupy.core.core.ndarray.get  
  File "cupy\cuda\memory.pyx", line 372, in cupy.cuda.memory.MemoryPointer.copy_to_host  
  File "cupy\cuda\runtime.pyx", line 255, in cupy.cuda.runtime.memcpy  
  File "cupy\cuda\runtime.pyx", line 135, in cupy.cuda.runtime.check_status  
cupy.cuda.runtime.CUDARuntimeError: cudaErrorLaunchFailure: unspecified launch failure  

Upvotes: 1

Views: 287

Answers (1)

Luca Ferraro
Luca Ferraro

Reputation: 722

There's no problem in your code: each iteration is independent from the other as you sum up zeros in a sequential mode. If you can run it without error using a single iteration, than your problem is not in the code implementation.

You are probably getting into an TDR error as pointed out in comments by Robert Crovella, since more iterations can delay response time of your GPU to the querying OS.

I you want to check if you're really getting into a TDR problem, supposing one iteration runs without problems, try to add a simple sleep of some seconds between each ff.dot operation in order to let the OS receive a response from the GPU.

I stress that this is not a solution to the TDR problem, but a simple way to detect if you're getting into it.

import time
...
for jj in range(nloop): 
   maa = ff.dot(maa, mbb)
   time.sleep(10)

Upvotes: 2

Related Questions