Reputation: 61
I've recently encountered a problem to process a pickle file of 8 Gigabytes with a Python script using VMs in Google Cloud Compute Engine. The problem is that the process takes too long and I am searching for ways to decrease the time of processing. One of possible solutions could be sharing the processes in the script or map them between CPUs of several VMs. If somebody knows how to perform it, please, share with me!))
Upvotes: 0
Views: 2403
Reputation: 775
You can use Clusters for Large-scale Technical Computing in the Google Cloud Platform (GCP). There are open source software like ElastiCluster provide cluster management and support for provisioning nodes while using Google Compute Engine (GCE).
After the cluster is operational, workload manager manages the task execution and node allocation. There are a variety of popular commercial and open source workload managers such as HTCondor from the University of Wisconsin, Slurm from SchedMD, Univa Grid Engine, and LSF Symphony from IBM.
This article is also helpful.
Upvotes: 2
Reputation: 1
it looks like an HPC problem. Look at this link: https://cloud.google.com/solutions/architecture/highperformancecomputing.
There are lot of valuable solutions to your problem but it depends on the details of your case. A first simple approach could be to logically split your task in small jobs. Then you can assign a subset of these jobs to each GCE instance in your group of dedicated instances.
You can consider to create a group of a predefined number of instances. Each run could rely on a startup scripts in order to reach out the job it must execute. When the job finishes the instance can be deleted and substituted by a new one (Google Compute Engine Managed Instance Groups will create a new instance automatically). You must only manage when the group should start and stop.
Furthermore, you can consider preemptible instances (more cheaper).
Hope this helps you. Bye
Upvotes: 0