Kdog
Kdog

Reputation: 513

Memory use with indexing and summing in numpy array

I am using list comprehension to index a numpy array and sum the values:

df[col]=np.array([A_numpy_array[b].sum() for b in B_numpy_array])

My A_numpy_array is indexed using the elements b of B_numpy_array (which has 8-9 million elements).

This part of the code is where the process takes awhile and I completely run out of RAM and begin to write to the disk.

To my knowledge, list comprehension is one of the most efficient methods in Python. Also, setting the pandas column this way is also efficient in pandas.

Is there and alternative way to slice A_numpy_array using the index values held in b that would allow me to get the sum of the values in a more memory efficient way?

Upvotes: 1

Views: 303

Answers (1)

Roim
Roim

Reputation: 3066

Depending on your data and how precise you want to be, changing the type of your data is the easiest way to reduce memory usage.

Check Numpy's Data Types and Pandas' Data Types for more information.

For example, sacrificing some calculations by using float32 instead of float64 can save you a lot of memory.

Before starting to dive in on how to optimize your code, it's worth trying this simple solution.

Upvotes: 1

Related Questions