Reputation: 513
I am using list comprehension to index a numpy array and sum the values:
df[col]=np.array([A_numpy_array[b].sum() for b in B_numpy_array])
My A_numpy_array
is indexed using the elements b
of B_numpy_array
(which has 8-9 million elements).
This part of the code is where the process takes awhile and I completely run out of RAM and begin to write to the disk.
To my knowledge, list comprehension is one of the most efficient methods in Python. Also, setting the pandas column this way is also efficient in pandas.
Is there and alternative way to slice A_numpy_array
using the index values held in b
that would allow me to get the sum of the values in a more memory efficient way?
Upvotes: 1
Views: 303
Reputation: 3066
Depending on your data and how precise you want to be, changing the type of your data is the easiest way to reduce memory usage.
Check Numpy's Data Types and Pandas' Data Types for more information.
For example, sacrificing some calculations by using float32
instead of float64
can save you a lot of memory.
Before starting to dive in on how to optimize your code, it's worth trying this simple solution.
Upvotes: 1