user10945353
user10945353

Reputation:

Does os.fork() pick up where it left off?

I have a function where certain data is being processed, and if the data meets a certain criteria, it's to be handled separate while the rest of the data is being processed.

As an arbitrary example if I'm scraping a web page and collecting all the attributes of an element, one of the elements is a form and just so happens to be hidden, I want to handle it separate, while the rest of the elements can continue being processed:

def get_hidden_forms(element_att):
    if element_att == 'hidden':
        os.fork()
        # handle this seperate
    else:
        # continue handling any elements that are not hidden
    #join both processes

Can this be done with os.fork() or is it intended for another purpose?

I know that os.fork() copies everything about the object, but I could just change values before forking, as stated in this post.

Upvotes: 0

Views: 97

Answers (1)

Ondrej K.
Ondrej K.

Reputation: 9679

fork basically creates a clone of the process calling it with a new address space and new PID.

From that point on, both processes would continue running next instruction after the fork() call. For this purpose, you normally inspect it's return value and decide what is appropriate action. If it return int greater than 0, it's the PID of child process and you know you are in its parent... you continue parents work. If it's equal to 0, you are in a child process and should do child's work. Value less then 0 means fork has failed, Python would handle that and raise OSError which you should handle (you're still in and there only is a parent).

Now the absolute minimum you'd need to take care of having forked a child process is to also make sure you wait() for them and reap their return codes properly, otherwise you will (at least temporarily) create zombies. That actually means you may want to implement a SICHLD handler to reap your process' children remains as they are done with their execution.

In theory you could use it the way you've described, but it may be a bit too "low level" (and uncomfortable) for that and perhaps would be easier to do and read/understand if you had dedicated code for what you want to handle separately and use multiprocessing to handle running this extra work in separate processes.

Upvotes: 1

Related Questions