oceanhug
oceanhug

Reputation: 1392

What is the best software stack for a small scientific computing cluster

I work in a research group doing a lot of Machine Learning and Computational Biology.

We currently have a cluster, but it is poorly maintained, suffers from low I/O throughput, and most critically doesn't have any setup for scheduling or load-balancing. Therefore, to use it, you have to find a free node yourself, ssh into that node, run your script on the command line, and manually collect your results.

What is the best software stack to implement an easy to use scheduler and load-balancer, such that users can submit their job to a central queue, have it run automatically when resources are available, and easily get their results back?

Upvotes: 0

Views: 1187

Answers (1)

Jonathan Dursi
Jonathan Dursi

Reputation: 50927

There's a number of scheduler/resource manager options that are open source and well thought of:

  • Torque/Maui, descendants of the venerable PBS, now maintained by adaptive computing
  • Slurm, a newer project out of LLNL, which has the advantage that it scales very well
  • Open Grid Engine, née Sun Grid Engine

But there's also a number of entire software stacks that aim to make managing a cluster easier:

I'm making this a community wiki for others who have suggestions.

Upvotes: 2

Related Questions