Ulderique Demoitre
Ulderique Demoitre

Reputation: 1068

Hardware requirements to deal with a big matrix - python

I am working on a python project where I will need to work with a matrix whose size is around 10000X10000X10000.

Considering that:

Are my requirements realistic? Which will be the hardware requirements I would need to work in such way in a decent time?

I am also open to switch language (for example, performing the linear algebra operations in C) if this could improve the performances.

Upvotes: 2

Views: 320

Answers (3)

pbreach
pbreach

Reputation: 17017

Maybe something like dask would be a good fit for you? There are other ways to do this with numpy like using memory mapped arrarys and doing operations in parallelizable chunks, but will be a bit more difficult especially if you are still getting comfortable with Python.

Personally I don't see much benefit of using a different language for this task. You'll still have to deal with hardware limitations and chunking for parallel operations. With dask this comes pretty much out of the box.

Upvotes: 1

Shirkam
Shirkam

Reputation: 754

Well, the first question is, wich type of value will you store in your matrix? Suposing it will be of integers (and suposing that every bytes uses the ISO specification for size, 4 bytes), you will have 4*10^12 bytes to store. That's a large amount of information (4 TB), so, in first place, I don't know from where you are taking all that information, and I suggest you to only load parts of it, that you can manage easily.

On the other side, as you can paralellize it, I will recommend you using CUDA, if you can afford a NVIDIA card, so you will have much better performance.

In summary, it's hard to have all that information only in RAM, and, use paralell languajes.

PD: You are using wrong the O() stimation about algorith time complexity. You should have said that you have a O(n), being n=size_of_the_matrix or O(nmt), being n, m and t, the dimensions of the matrix.

Upvotes: 4

Nick Slavsky
Nick Slavsky

Reputation: 1330

Actually, the memory would be a big issue here. Depending on the type of the matrix elements. Each float takes 24 bytes for example as it is a boxed object. As your matrix is 10^12 you can do the math. Switching to C would probably make it more memory-efficient, but not faster, as numpy is essentially written in C with lots of optimizations

Upvotes: 1

Related Questions