ash291
ash291

Reputation: 1

Sklearn Kernel Density Data Type

I need to specify the dtype (data type) for sklearn's Kernel Density Function within a definition block from nvidia's rapids cudf library. In Python 3.7, I am able to find type information, but for some reason, it is not considered an accepted data type with nvidia's rapids def block. I am including my code and error message below so that anyone can reproduce the error message.

Here is the code for the typical implementation of Kernel Density function:

from sklearn.neighbors import KernelDensity
import numpy as np

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(X)
kde.score_samples(X)
     array([-0.41075698, -0.41075698, -0.41076071, -0.41075698, -0.41075698,
    -0.41076071])

type(kde)
     <class 'sklearn.neighbors.kde.KernelDensity'>

Here is the NVIDIA Rapids Def block that I used with Sklearn's Kernel Density Function:

import cudf, math
import numpy as np

df = cudf.DataFrame()
nelem = 10
df['in1'] = np.arange(nelem) * 1.5
df['in2'] = np.arange(nelem) * 1.45


#Define input columns for the kernel

in1 = df['in1']
in2 = df['in2']

def kernel(in1, in2, out1, out2, out3, out4, kwarg1, kwarg2):
    for i, (x, y) in enumerate(zip(in1, in2)):
        out1[i] = [math.tan(i) for i in x]
        out2[i] = np.array(out1[i].to_pandas())
        out3[i] = ((KernelDensity(kernel='gaussian', bandwidth=kwarg1).fit(out2[i])).score_samples(out2[i]))
        out4[i] = [i >= kwarg2 for i in out3[i]]

Results = cudf.DataFrame()
Results = df.apply_rows(kernel, incols=['in1','in2'], outcols=dict(out1='float', out2='float64', out3='float64', out4='float'), kwargs=dict(kwarg1=0.1, kwarg2=0.33))

Here is the error message (perhaps if I get the dtype correct for x and out3, this will resolve all of the errors):

 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/dataframe/dataframe.py", line 2707, in apply_rows
self, func, incols, outcols, kwargs, cache_key=cache_key
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 64, in apply_rows return applyrows.run(df)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 128, in run self.launch_kernel(df, bound.args, **launch_params)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 152, in launch_kernel self.kernel[blkct, blksz](*args)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 806, in __call__ kernel = self.specialize(*args)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 817, in specialize kernel = self.compile(argtypes)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 833, in compile **self.targetoptions)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock return func(*args, **kwargs)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 62, in compile_kernel
cres = compile_cuda(pyfunc, types.void, args, debug=debug, inline=inline)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock, return func(*args, **kwargs)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 51, in compile_cuda, locals={})
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 972, in compile_extra, return pipeline.compile_extra(func)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 390, in compile_extra, return self._compile_bytecode()
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 903, in _compile_bytecode, return self._compile_core()
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 890, in _compile_core, res = pm.run(self.status)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock, return func(*args, **kwargs)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 266, in run
raise patched_exception
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 257, in run
stage()
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 515, in stage_nopython_frontend self.locals)
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 1124, in type_inference_stage, infer.propagate()
  File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py", line 927, in propagate, raise errors[0]
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f2679e6f9e8>) with argument(s) of type(s): (array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), float64, float64) * parameterized

In definition 0:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'x': cannot determine Numba type of <class 'numba.ir.UndefinedType'>

File "<stdin>", line 2:
<source missing, REPL/exec in use?>

raised from /anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py:1254

In definition 1:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'x': cannot determine Numba type of <class 'numba.ir.UndefinedType'>

File "<stdin>", line 2:
<source missing, REPL/exec in use?>

raised from /anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py:1254
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f2679e6f9e8>)
[2] During: typing of call at <string> (11)


 File "<string>", line 11:
 <source missing, REPL/exec in use?>

Upvotes: 0

Views: 656

Answers (1)

keiv.fly
keiv.fly

Reputation: 4015

The code that works is below. Some of your lines are incompatible with cudf:

  1. Using i alone and not for indexing does not work. It is always zero. Therefore out1 is also zeros
  2. Classes from sklearn are not compatible with numba nopython mode. This holds true for any library that numba does not specifically support. I do not know of any library that includes kernel density estimation that is supported in numba. Numpy is supported, but it does not have a kernel density estimation.
  3. df.apply_rows() does not allow to apply a function to multiple rows, which you need in order to calculate kernel density. You probably need to use a df.apply_chunks().

To implement a kernel density estimation you will need:

  1. Use df.apply_chunks()
  2. Create a custom function that will be calculating kernel density. You could use parts of this code to create your function: KernelDensity source code
  3. The custom function should be able to apply a kernel to a np.array to calculate the value for every window
  4. apply_chunks() function should be set up so that the chuncks are rolling windows

Code:

import cudf, math
import numpy as np

df = cudf.DataFrame()
nelem = 10
df['in1'] = np.arange(nelem) * 1.5
df['in2'] = np.arange(nelem) * 1.45


#Define input columns for the kernel

in1 = df['in1']
in2 = df['in2']

def kernel(in1, in2, out1, out2, out3, out4, kwarg1, kwarg2):
    for i, (x, y) in enumerate(zip(in1, in2)):
        out1[i] = math.tan(float(i)) 
        out2[i] = out1[i]
        out3[i] = 1 #((KernelDensity(kernel='gaussian', bandwidth=kwarg1).fit(out2[i])).score_samples(out2[i]))
        out4[i] = out3[i] >= kwarg2 

Results = cudf.DataFrame()
Results = df.apply_rows(kernel, incols=['in1','in2'], outcols=dict(out1=np.float64, out2=np.float64, out3=np.float64, out4=np.float64), kwargs=dict(kwarg1=0.1, kwarg2=0.33))

Upvotes: 2

Related Questions