Reputation: 95
I have a large data set (185GB) that I plan to perform some machine learning algorithms on it. The data is on a local computer with restricted computational powers. I have access to a remote cluster where I can perform my computationally expensive algorithms. It has 1TB of memory and is pretty fast. But for some reasons I only have 2GB(!) of disk storage on the remote server.
I can connect to the cluster via SSH, is there any way on python that I can load the database to the RAM via SSH?
Any general tips on how to tackle this problem is much appreciated.
Upvotes: 4
Views: 484
Reputation: 1496
You may want to use paramiko so that you can connect with SSH from within Python. Then, you can run commands that output your data and read it from the stream. This would work better than copying the files over because it wont involve copying the data to disk. If the data is in files, then you can just use paramiko to cat
the files and read the data from the stream.
Upvotes: 1