Reputation: 450
I have two arrays A
and B
that have size (10000,100,100)
(very large). I need to perform a series of operations to pass them to other functions. My question is: how can I save the most amount of memory? Let me give a specific example.
A = np.random.rand(10000,100,100)
B = np.random.rand(10000,100,100)
def ave_l2_error(diffs):
for err in diffs:
print(np.mean(err))
def ave_l1_error(diffs):
for err in diffs:
print(np.mean(err))
#Is there a difference in terms of memory usage between doing this:
L2 = [np.power(A-B, 2)]
L1 = [np.abs(A-B)]
ave_l2_error(L2)
ave_l1_error(L1)
#vs this:
ave_l2_error([np.power(A-B, 2)])
ave_l1_error([np.abs(A-B)])
I would think the first case uses more memory because it saves L1
and L2
. This reddit thread discusses renaming variables, but this is a slightly different situation (or maybe not). Would here the garbage collector detect L1
and L2
are not used anymore, and hence it deletes them? What if the code is run in IPython (instead of a shell), where one has access to variables? Would that case make a difference?
Upvotes: 0
Views: 476
Reputation: 781004
In the first version, the arrays created by np.power()
and np.abs()
will stay in memory until the script ends, because the variables prevent them from becoming garbage.
In the second version, the arrays will be garbage collected when the function returns, because they were only assigned to the function parameters, which go away when the function exits. So this version will use less memory.
You can make the first version like the second if you reassign or delete the variables after using them in the function calls.
Upvotes: 1