Pramit Mazumder
Pramit Mazumder

Reputation: 80

Can't Fit Model in Python Process?

I can't get my keras model to train within a multiprocessing Process. I have a queue of data that is being written to on the main thread, so I would like the model to simultaneously train itself. However, it just hangs on the line where the call to model.fit() is made. The model is a keras multi gpu model.

I have tried to not make the process a daemon, with no change in results.

This works fine:

def reader_proc(queue, model):
    while (True):
        if (queue.empty()):
            time.sleep(10)
            continue
        d = queue.get()
        x = d[0]
        y = d[1]
        print("training")
        time.sleep(1)
        print(y[0])
        print("done training")
        sys.stdout.flush()

This does not:

def reader_proc(queue, model):
    while (True):
        if (queue.empty()):
            time.sleep(10)
            continue
        d = queue.get()
        x = d[0]
        y = d[1]
        print("training")
        model.fit(x=x, y=[y.T[0], y.T[1]], epochs=1, batch_size=32, callbacks=[tensorboard_callback, checkpoint],shuffle=True)
        print("done training")
        sys.stdout.flush()

The process is being started like so:

reader_p = Process(target=reader_proc, args=(pqueue, parallel_model))
reader_p.daemon = True
reader_p.start()

Calling train on the data outside the process works fine as well:

d = pqueue.get()
x = d[0]
y = d[1]
parallel_model.fit(x=x, y=[y.T[0], y.T[1]], epochs=1, batch_size=32, callbacks=[tensorboard_callback, checkpoint],shuffle=True)

When the call to model.fit() is added, the thread prints ("training") but never prints the "done training." The example with sleep works as expected.

Upvotes: 1

Views: 608

Answers (1)

Roland Smith
Roland Smith

Reputation: 43495

According to the documentation (see the note at the end of the linked section), multiprocessing.Pool does not work in IPython (which is an "interactive interpreter"), especially on ms-windows.

Upvotes: 2

Related Questions