Reputation: 1
I am trying to write my first function for parallelization
But the problem is, that I get the following error:
possible_EVENTS = np.array([""])
@njit(parallel=True,nopython=True)
def add_History(events,EVENTS):
index=events.where(events=="Repair")
add=np.array("")
pre_events=events[index:]
if ("LOMT" in pre_events) & ("DIMT" in pre_events):
for i in pre_events:
if "FU" not in i:
add.append(i)
else:
break
if len(add)>=5:
possible_EVENTS .append(add)
for i in tqdm(packages):
events=np.array(df_["Event"].loc[df_["Packages"]==i].values)
add_History(events,possible_EVENTS )
But I get this error.
TypingError Traceback (most recent call last)
C:\Users\local_PIETAPA\Temp\ipykernel_16492\3239905456.py in <module>
16 for container in tqdm(container_search):
17 events=np.array(df_["EVENT_CODE"].loc[df_["UNIT"]==container].values)
---> 18 add_EPOS_History(events,EPOS_EVENTS)
C:\ProgramData\Anaconda3\lib\site-packages\numba\core\dispatcher.py in _compile_for_args(self, *args, **kws)
466 e.patch_message(msg)
467
--> 468 error_rewrite(e, 'typing')
469 except errors.UnsupportedError as e:
470 # Something unsupported is present in the user code, add help info
C:\ProgramData\Anaconda3\lib\site-packages\numba\core\dispatcher.py in error_rewrite(e, issue_type)
407 raise e
408 else:
--> 409 raise e.with_traceback(None)
410
411 argtypes = []
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type array(pyobject, 1d, C)
During: typing of argument at C:\Users\local_PIETAPA\Temp\ipykernel_16492\3239905456.py (4)
File "..\..\..\local_PIETAPA\Temp\ipykernel_16492\3239905456.py", line 4:
<source missing, REPL/exec in use?>
And I have no idea how to fix it. Can you help me?
Upvotes: 0
Views: 2907
Reputation: 9
The error in the code is related to the use of possible_EVENTS. It is defined as a numpy array with a single string element, [""]. However, in the add_History function, you are trying to append to this array, which is not possible with numpy arrays.
You can resolve this issue by converting the numpy array to a Python list, appending the elements to the list, and converting the list back to a numpy array.
Here's an updated version of the code:
import numpy as np
from numba import njit
possible_EVENTS = [""]
@njit(parallel=True,nopython=True)
def add_History(events,EVENTS):
index=events.where(events=="Repair")
add=[]
pre_events=events[index:]
if ("LOMT" in pre_events) & ("DIMT" in pre_events):
for i in pre_events:
if "FU" not in i:
add.append(i)
else:
break
if len(add)>=5:
possible_EVENTS.append(add)
for i in tqdm(packages):
events=np.array(df_["Event"].loc[df_["Packages"]==i].values)
add_History(events,possible_EVENTS)
possible_EVENTS = np.array(possible_EVENTS)
Note that using the njit decorator with parallel=True may not always result in faster execution, and it is recommended to test the performance with and without parallelization to determine the best approach.
Upvotes: 0
Reputation: 50308
Numba complains because you are providing it dynamic Python object which are not supported by Numba (and cannot be efficiently supported by any similar tools). This is certainly because events
is an array of object and not an array of string. You need to ensure the input dtype is of type string, not object.
There are many other issues in the code:
Note that strings are not yet efficiently supported by Numba.
Also please note that parallel=True
will not automagically parallelise your function. You need to use prange
for that and check it can work (ie. it not introduces bugs like race conditions). The thing is you cannot easily use prange
here because of the append
so parallel=True
is currently useless.
Using add.append(i)
is a bad idea here since add
is a Numpy array. Doing that cause the execution time to be quadratic instead of linear because new array is created every time the function is called. This is a common issue. The solution is to append the items to a list and then convert the list to a Numpy array.
Global arrays are assumed to be constant. You should not modify them (this is a very bad in software engineering anyway). The solution is to provide the array to the function and return the you one. Indeed, append
creates a new array, it does not modify the current array. I strongly advise you to carefully read the documentation of np.array
. In the end, your current function compute nothing, even without Numba! Please check your code before trying to make it faster.
events.where(events=="Repair")
travel the whole events
array which is less efficient than iterating over the array and break when the item location is found.
Upvotes: 0