mota
mota

Reputation: 5475

How can I reduce the memory footprint of this numpy gradient descent code?

I'm new to numpy and still can't understand how it handles memory. I know my code is not optimal in terms of memory but couldn't really think of other ways without understanding how the memory is handled. Here's the code:

# X.shape = (700, 23000)
# Y.shape = (700, 23000)
learning_rate=0.7
iters=100

W = np.random.random((Y.shape[0], X.shape[0])) - 0.5
num_examples = Y.shape[1]

for i in range(int(iters)):
    error = W.dot(X) - Y
    xerror = X.dot(error.T)
    d = (1.0 / num_training_examples * xerror).T
    W = W - (learning_rate * d)

The code works when the number of training examples is small (~1000), but when it's as big as 20K the memory explodes. Any help to optimise the memory footprint of this code is very much appreciated.

Upvotes: 1

Views: 206

Answers (1)

CrazyCasta
CrazyCasta

Reputation: 28342

It depends on what you mean by the memory "explodes". I believe numpy uses double for matrices and vectors. That means 8 bytes per number. Therefore a 20k x 20k problem is going to be 400M numbers or 3.2 GB. If that's what you mean then it's just a problem of scale (you have too big of a problem) and you need to find a different way to represent it if such memory uses are too large.

Based on your comment about X and Y being matrices. If your problem size is only 20k ish you can get some savings by only handling a column at a time of X and Y. At 23k rows that will reduce your memory footprint to 1/3. If you scale up to 46k rows that will be up to 2/3 (only a reduction of 1/3) and by 92k rows you'll only reduce to 16/18 (0.888).

Generally when you go to large problems you start dealing with things like iterative algorithms and sparse matrices to reduce the memory load. See for instance conjugate gradient.

Upvotes: 1

Related Questions