Reputation: 5148
I have a memory constraint of 4GB RAM. I need to have 2.5 GB of data in RAM in order to perform further things
import numpy
a = numpy.random.rand(1000000,100)## This needs to be in memory
b= numpy.random.rand(1,100)
c= a-b #this need to be in code in order to perform next operation
d = numpy.linalg.norm(numpy.asarray(c, dtype = numpy.float32), axiss =1)
While creating c, memory usage explodes and python gets killed. Is there a way to fasten this process. I am performing this on EC2 Ubuntu with 4GB RAM and single core. When I perform same calculation on my MAC OSX, it gets easily done without any memory problem as well as takes less time. Why is this happening?
One solution that I can think of is
d =[numpy.sqrt(numpy.dot(i-b,i-b)) for i in a]
which I dont think will be good for speed.
Upvotes: 0
Views: 200
Reputation: 114811
If the creation of a
doesn't cause a memory problem, and you don't need to preserve the values in a
, you could compute c
by modifying a
in place:
a -= b # Now use `a` instead of `c`.
Otherwise, the idea of working in smaller chunks or batches is a good one. With your list comprehension solution, you are, in effect, computing d
from a
and b
in batch sizes of one row of a
. You can improve the efficiency by using a larger batch size. Here's an example; it includes your code (with some cosmetic changes) and a version that computes the result (called d2
) in batches of batch_size
rows of a
.
import numpy as np
#n = 1000000
n = 1000
a = np.random.rand(n,100) ## This needs to be in memory
b = np.random.rand(1,100)
c = a-b # this need to be in code in order to perform next operation
d = np.linalg.norm(np.asarray(c), axis=1)
batch_size = 300
# Preallocate the result.
d2 = np.empty(n)
for start in range(0, n, batch_size):
end = min(start + batch_size, n)
c2 = a[start:end] - b
d2[start:end] = np.linalg.norm(c2, axis=1)
Upvotes: 2
Reputation: 4051
If memory is the reason your code is slowing down and crashing why not just use a generator instead of list comprehension?
d =(numpy.sqrt(numpy.dot(i-b,i-b)) for i in a)
A generator essentially provides steps to get the next object in an iterator. In other words no operation is made and no data is stored until you call the next() method on the generator's iterator. The reason I'm stressing this is I don't want you to think when you call d =(numpy.sqrt(numpy.dot(i-b,i-b)) for i in a)
it does all the computations, rather it stores the instructions.
Upvotes: 1