Reputation: 369
I am trying to write my first function using numba jit, I have a pandas dataframe that I need to iterate through and find the root mean square for each 350 points, since the for loop of python is quite slow I decided to try numba jit, the code is:
@jit(nopython=True)
def find_rms(data, length):
res = []
for i in range(length, len(data)):
interval = np.array(data[i-length:i])
interval =np.power(interval, 2)
sum = interval.sum()
resI = sum/length
resI = np.sqrt(res)
res.appennd(resI)
return res
mydf = np.array(df.iloc[:]['c0'], dtype=np.float64)
df.iloc[350:]['rms'] = find_rms(mydf, 350)
I read somewhere thad I need to specify datatypes, therefore I wrote "dtype = np.float64" but I still get the error as:
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
<ipython-input-39-4d388f72efdc> in <module>
----> 1 df.iloc[350:]['rms'] = find_rms(mydf, 350.0)
c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\dispatcher.py in _compile_for_args(self, *args, **kws)
346 e.patch_message(msg)
347
--> 348 error_rewrite(e, 'typing')
349 except errors.UnsupportedError as e:
350 # Something unsupported is present in the user code, add help info
c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\dispatcher.py in error_rewrite(e, issue_type)
313 raise e
314 else:
--> 315 reraise(type(e), e, None)
316
317 argtypes = []
c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\six.py in reraise(tp, value, tb)
656 value = tp()
657 if value.__traceback__ is not tb:
--> 658 raise value.with_traceback(tb)
659 raise value
660
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function array>) with argument(s) of type(s): (array(float64, 1d, C))
* parameterized
In definition 0:
TypingError: array(float64, 1d, C) not allowed in a homogeneous sequence
raised from c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\typing\npydecl.py:463
In definition 1:
TypingError: array(float64, 1d, C) not allowed in a homogeneous sequence
raised from c:\users\1\appdata\local\programs\python\python35\lib\site-packages\numba\typing\npydecl.py:463
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<built-in function array>)
[2] During: typing of call at <ipython-input-34-edd252715b2d> (5)
File "<ipython-input-34-edd252715b2d>", line 5:
def find_rms(data, length):
<source elided>
for i in range(length, len(data)):
interval = np.array(data[i-length:i])
^
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile
If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new
Does anybody know what the problem is?
Upvotes: 10
Views: 36754
Reputation: 926
You had a typo in append
and I think you also made a mistake with what the square root is to be taken of (I believe resI
not res
).
Other than that, the only problem was the initialization of interval
. Numba
doesn't want you to pass a numpy array to a numpy array. It doesn't help with anything to wrap the np.array
around the slice of the array, python simply doesn't care if you do that and treats the code like you didn't but Numba
in nopython mode does care and throws an error. Leaving that part out solved the problem.
@jit(nopython=True)
def find_rms(data, length):
res = []
for i in range(length, len(data)):
interval = data[i-length:i]
interval = np.power(interval, 2)
sum = interval.sum()
resI = sum/length
resI = np.sqrt(resI)
res.append(resI)
return res
mydf = np.array(df.iloc[:]['c0'], dtype=np.float64)
target = find_rms(mydf, 350)
Upvotes: 2