shihpeng
shihpeng

Reputation: 5381

Best module to do large matrix computation in Python?

I am developing a simple recommendation system and trying to do some computation like SVD, RBM, etc.

To be more convincing, I am going to use the Movielens or Netflix dataset to evaluate the performance of the system. However, the two datasets both have more than 1 million of users and more than 10 thousand of items, it's impossible to put all the data into memory. I have to use some specific modules to handle such a large matrix.

I know there are some tools in SciPy can handle this, and divisi2 used by python-recsys also seems like a good choice. Or maybe there are some better tools I don't know?

Which module should I use? Any suggestion?

Upvotes: 4

Views: 3465

Answers (3)

Austin Henley
Austin Henley

Reputation: 4633

I would suggest SciPy, specifically Sparse. As Dougal pointed out, Numpy is not suited for this situation.

Upvotes: 6

Hoai-Thu Vuong
Hoai-Thu Vuong

Reputation: 1957

I found another solution named crab, I try finding and comparing some of them.

Upvotes: 2

specialscope
specialscope

Reputation: 4228

If your concern is just putting the data in the memory use 64bit python with 64bit numpy. If you dont have enough physical memory you can just increase virtual memory in os level. The size of virtual memory is only limited by your hdd size. Speed of computation however is a different beast!

Upvotes: -1

Related Questions