camz
camz

Reputation: 605

State shared in multiprocessing when using singleton classes

Here is a toy example of the problem I am having. I have a Singleton class which is used in a large python script. I want to run this script many times with multiple inputs: The singleton pattern is not necessary here, but in my more complicated real use there is a reason to use it

import time
import multiprocessing


class TestClass(object):
  instance = None

  @classmethod
  def get_instance(cls):
    if cls.instance is None:
      print 'creating instance'
      cls.instance = TestClass()
    return cls.instance

  def __init__(self):
    self.data = []


def worker(num):
  tc = TestClass.get_instance()
  time.sleep(0.1)
  tc.data.append(num)
  return tc.data


def main():
  pool = multiprocessing.Pool(processes=1)
  res = pool.map(worker, range(10))

  print res 
  print TestClass.get_instance().data


main()

When I run the above code, it seems the state of the TestClass.instance is (semi-?)shared. The result is:

[[0, 1, 2], [0, 1, 2], [0, 1, 2], ..., [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

If I change the number of processes to 10 I get:

[[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]

This is the result I would like.

(If I print TestClass.get_instance().data after calling pool.map() Then I get an empty list.)

What is the explanation for this behavior? Is there a way to stop this state sharing and get the second output, but keep control of the number of processes in the pool?

Edit: When I choose N instances to run the function with M different arguments. An instance is created N times. Ideally I want M instances created, one for each argument.

Upvotes: 2

Views: 1567

Answers (1)

James Mills
James Mills

Reputation: 19050

Okay; paraphrasing:

I want to create N processes to run the function for M different arguments; but instead of N worker processes I want M worker processes; one per permutation of arguments.

This really isn't possible with multiprocessing.Pool as it wasn't designed for this use-case. It's more analogous to the builtin map() function where you apply a function to a sequence of inputs that are CPU-bound across multiple worker processes.

You will have to manage a set of worker processes yourself using multiprocessing.Process().

Upvotes: 1

Related Questions