Reputation: 2703
I am relatively new to Python and trying to implement a Multiprocessing module for my for loop.
I have an array of Image url's stored in img_urls which I need to download and apply some Google vision.
if __name__ == '__main__':
img_urls = [ALL_MY_Image_URLS]
runAll(img_urls)
print("--- %s seconds ---" % (time.time() - start_time))
This is my runAll() method
def runAll(img_urls):
num_cores = multiprocessing.cpu_count()
print("Image URLS {}",len(img_urls))
if len(img_urls) > 2:
numberOfImages = 0
else:
numberOfImages = 1
start_timeProcess = time.time()
pool = multiprocessing.Pool()
pool.map(annotate,img_urls)
end_timeProcess = time.time()
print('\n Time to complete ', end_timeProcess-start_timeProcess)
print(full_matching_pages)
def annotate(img_path):
file = requests.get(img_path).content
print("file is",file)
"""Returns web annotations given the path to an image."""
print('Process Working under ',os.getpid())
image = types.Image(content=file)
web_detection = vision_client.web_detection(image=image).web_detection
report(web_detection)
I am getting this as the warning when I run it and python crashes
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
Upvotes: 196
Views: 104598
Reputation: 61
I was facing this issue on MacOS, the following flag worked for me.
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Upvotes: 3
Reputation: 354
The OBJC_DISABLE_INITIALIZE_FORK_SAFETY = YES
solution didn't work for me. Another potential solution is setting no_proxy = *
in your script environment as described here.
Besides the causes covered by others, this error message can also be networking related. My script has a tcp server. I don't even use a pool, just os.fork
and multiprocessing.Queue
for message passing. The forks worked fine until I added the queue.
Setting no_proxy by itself fixed it in my case. If your script has networking components, try this fix - perhaps in combination with OBJC_DISABLE_INITIALIZE_FORK_SAFETY
.
Upvotes: 8
Reputation: 2536
Running MAC and z-shell and in my .zshrc-file I had to add:
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
and then in the command line:
source ~/.zshrc
Then it worked
Upvotes: 51
Reputation: 11368
The solution that works for me without OBJC_DISABLE_INITIALIZE_FORK_SAFETY
flag in the environment involves initializing the multiprocessing.Pool
class right after the main()
program starts.
This is most likely not the fastest solution possible and I am not sure if it works in all situations, however, pre-heating the worker processes early enough before my programs starts does not result in any ... may have been in progress in another thread when fork() was called
errors and I do get a significant performance boost compared to what I get with non-parallelized code.
I have created a convenience class Parallelizer
which I am starting very early and then using throughout the lifecycle of my program. The full version can be found here.
# entry point to my program
def main():
parallelizer = Parallelizer()
...
Then whenever you want to have parallelization:
# this function is parallelized. it is run by each child process.
def processing_function(input):
...
return output
...
inputs = [...]
results = parallelizer.map(
inputs,
processing_function
)
And the parallelizer class:
class Parallelizer:
def __init__(self):
self.input_queue = multiprocessing.Queue()
self.output_queue = multiprocessing.Queue()
self.pool = multiprocessing.Pool(multiprocessing.cpu_count(),
Parallelizer._run,
(self.input_queue, self.output_queue,))
def map(self, contents, processing_func):
size = 0
for content in contents:
self.input_queue.put((content, processing_func))
size += 1
results = []
while size > 0:
result = self.output_queue.get(block=True)
results.append(result)
size -= 1
return results
@staticmethod
def _run(input_queue, output_queue):
while True:
content, processing_func = input_queue.get(block=True)
result = processing_func(content)
output_queue.put(result)
One caveat: the parallelized code might be difficult to debug so I have also prepared a non-parallelizing version of my class which I enable when something goes wrong in the child processes:
class NullParallelizer:
@staticmethod
def map(contents, processing_func):
results = []
for content in contents:
results.append(processing_func(content))
return results
Upvotes: 2
Reputation: 2431
the other answers are telling you to set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
, but don't do this! you're just putting sticky tape on the warning light. You may need this on a case by case basis for some legacy software but certainly do not set this in your .bash_profile
!
this is fixed in https://bugs.python.org/issue33725 (python3.8+) but it's best practice to use
with multiprocessing.get_context("spawn").Pool() as pool:
pool.map(annotate,img_urls)
Upvotes: 26
Reputation: 6515
This error occurs because of added security to restrict multithreading in macOS High Sierra and later versions of macOS. I know this answer is a bit late, but I solved the problem using the following method:
Set an environment variable .bash_profile
(or .zshrc
for recent macOS) to allow multithreading applications or scripts under the new macOS High Sierra security rules.
Open a terminal:
$ nano .bash_profile
Add the following line to the end of the file:
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Save, exit, close terminal and re-open the terminal. Check to see that the environment variable is now set:
$ env
You will see output similar to:
TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/pn/vasdlj3ojO#OOas4dasdffJq/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.E7qLFJDSo/Render
TERM_PROGRAM_VERSION=404
TERM_SESSION_ID=NONE
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
You should now be able to run your Python script with multithreading.
Upvotes: 426