Reputation: 1
I need to specify the dtype (data type) for sklearn's Kernel Density Function within a definition block from nvidia's rapids cudf library. In Python 3.7, I am able to find type information, but for some reason, it is not considered an accepted data type with nvidia's rapids def block. I am including my code and error message below so that anyone can reproduce the error message.
Here is the code for the typical implementation of Kernel Density function:
from sklearn.neighbors import KernelDensity
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(X)
kde.score_samples(X)
array([-0.41075698, -0.41075698, -0.41076071, -0.41075698, -0.41075698,
-0.41076071])
type(kde)
<class 'sklearn.neighbors.kde.KernelDensity'>
Here is the NVIDIA Rapids Def block that I used with Sklearn's Kernel Density Function:
import cudf, math
import numpy as np
df = cudf.DataFrame()
nelem = 10
df['in1'] = np.arange(nelem) * 1.5
df['in2'] = np.arange(nelem) * 1.45
#Define input columns for the kernel
in1 = df['in1']
in2 = df['in2']
def kernel(in1, in2, out1, out2, out3, out4, kwarg1, kwarg2):
for i, (x, y) in enumerate(zip(in1, in2)):
out1[i] = [math.tan(i) for i in x]
out2[i] = np.array(out1[i].to_pandas())
out3[i] = ((KernelDensity(kernel='gaussian', bandwidth=kwarg1).fit(out2[i])).score_samples(out2[i]))
out4[i] = [i >= kwarg2 for i in out3[i]]
Results = cudf.DataFrame()
Results = df.apply_rows(kernel, incols=['in1','in2'], outcols=dict(out1='float', out2='float64', out3='float64', out4='float'), kwargs=dict(kwarg1=0.1, kwarg2=0.33))
Here is the error message (perhaps if I get the dtype correct for x and out3, this will resolve all of the errors):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/dataframe/dataframe.py", line 2707, in apply_rows
self, func, incols, outcols, kwargs, cache_key=cache_key
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 64, in apply_rows return applyrows.run(df)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 128, in run self.launch_kernel(df, bound.args, **launch_params)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 152, in launch_kernel self.kernel[blkct, blksz](*args)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 806, in __call__ kernel = self.specialize(*args)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 817, in specialize kernel = self.compile(argtypes)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 833, in compile **self.targetoptions)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock return func(*args, **kwargs)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 62, in compile_kernel
cres = compile_cuda(pyfunc, types.void, args, debug=debug, inline=inline)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock, return func(*args, **kwargs)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 51, in compile_cuda, locals={})
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 972, in compile_extra, return pipeline.compile_extra(func)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 390, in compile_extra, return self._compile_bytecode()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 903, in _compile_bytecode, return self._compile_core()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 890, in _compile_core, res = pm.run(self.status)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock, return func(*args, **kwargs)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 266, in run
raise patched_exception
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 257, in run
stage()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 515, in stage_nopython_frontend self.locals)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 1124, in type_inference_stage, infer.propagate()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py", line 927, in propagate, raise errors[0]
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f2679e6f9e8>) with argument(s) of type(s): (array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), float64, float64) * parameterized
In definition 0:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'x': cannot determine Numba type of <class 'numba.ir.UndefinedType'>
File "<stdin>", line 2:
<source missing, REPL/exec in use?>
raised from /anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py:1254
In definition 1:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'x': cannot determine Numba type of <class 'numba.ir.UndefinedType'>
File "<stdin>", line 2:
<source missing, REPL/exec in use?>
raised from /anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py:1254
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f2679e6f9e8>)
[2] During: typing of call at <string> (11)
File "<string>", line 11:
<source missing, REPL/exec in use?>
Upvotes: 0
Views: 656
Reputation: 4015
The code that works is below. Some of your lines are incompatible with cudf:
To implement a kernel density estimation you will need:
Code:
import cudf, math
import numpy as np
df = cudf.DataFrame()
nelem = 10
df['in1'] = np.arange(nelem) * 1.5
df['in2'] = np.arange(nelem) * 1.45
#Define input columns for the kernel
in1 = df['in1']
in2 = df['in2']
def kernel(in1, in2, out1, out2, out3, out4, kwarg1, kwarg2):
for i, (x, y) in enumerate(zip(in1, in2)):
out1[i] = math.tan(float(i))
out2[i] = out1[i]
out3[i] = 1 #((KernelDensity(kernel='gaussian', bandwidth=kwarg1).fit(out2[i])).score_samples(out2[i]))
out4[i] = out3[i] >= kwarg2
Results = cudf.DataFrame()
Results = df.apply_rows(kernel, incols=['in1','in2'], outcols=dict(out1=np.float64, out2=np.float64, out3=np.float64, out4=np.float64), kwargs=dict(kwarg1=0.1, kwarg2=0.33))
Upvotes: 2