nickexists
nickexists

Reputation: 573

Large Array of binary data

I'm working with a large 3 dimensional array of data that is binary, each value is one of two possible values. I currently have this data stored in the numpy array as int32 objects that are either 1 or 0.

It works fine for small arrays but eventually i will need to make the array 5000x5000x20, which I can't even get close to without getting "Memory Error".

Does anyone have any suggestions for a better way to do this? I am really hoping that I can keep it all together in one data structure because I will need to access slices of it along all three axes.

Upvotes: 1

Views: 496

Answers (2)

Emanuele Paolini
Emanuele Paolini

Reputation: 10172

Another possibility is to represent the last axis of 20 bits as a single 32 bit integer. This way a 5000x5000 array would suffice.

Upvotes: 1

sapi
sapi

Reputation: 10224

You'll get better performance if you change the datatype of your numpy array to something smaller.

For data which can take one of two values, you could use uint8, which will always be a single byte:

arr = np.array(your_data, dtype=np.uint8)

Alternatively, you could use np.bool, though I'm not sure offhand whether that is in fact a 8 bit value or whether it uses the native word size. (I tend to explicitly use the 8 bit value for clarity, though that's more a personal choice.)


At the end of the day, though, you're talking about a lot of data, and it's quite possible that even with a smaller set of values, you won't be able to load it all into python at once.

In that case, it might be worth investigating whether you can break up your problem into smaller parts.

Upvotes: 2

Related Questions