CurlyFries
CurlyFries

Reputation: 37

Batch processing on Google Compute Engine in Java

How do I get started with Compute Engine and setup a Java batch job that; runs continuously with very small intervals (constantly), reads from Google Datastore, processes the data and writes to Google Datastore?

Right now I have a game application running on GAE. When users initiate a game an entity is stored in the Datastore. The game is someway time-based and I wanna be able to frequently and efficiently check the games and make notifications if necessary. At the moment this is done by a task queue that runs for 10 minutes and schedules itselves when it is finish. However I do not feel that this is the correct way to handle this, and will therefore migrate to GCE for better performance and scaling opportunities.

I have read the GCE “get-started-guide”, but this only tells how to connect via SSH and install programs and how to make a very simple website. Where can I find a guide that explains how to create an initial Java project aimed for GCE and using some of Google APIs like Datastore etc. Any advices on how to get started is highly appreciated

Upvotes: 1

Views: 1455

Answers (1)

Bill Prin
Bill Prin

Reputation: 2518

Google Cloud DevRel has started some guides to provide some clarification on this exact topic, like http://cloud.google.com/python, http://cloud.google.com/nodejs, etc, but Java won't be finished for a few months.

If you like fully controlling your infrastructure, you can definitely use GCE, but if I were you, I would stick to App Engine, since it automates a lot of scaling you would have to do manually. GCE provides auto-scaling features, but they are more involved than App Engine. But if you want to see what they look like, the Python GCE section isn't especially specific to Python:

https://cloud.google.com/python/getting-started/run-on-compute-engine#multiple_instances

If you're finding App Engine limiting, you can look into migrating instead to Managed VMs, which is similar to App Engine but lets you do things like install custom libraries using a Dockerfile.

As far as Task Queues, they are still officially supported, but if you are interested in massive scalability, you can checkout Cloud Pub/Sub as well and see if it fits your needs.

If your data size is getting large, Cloud Dataflow lets you run real-time streaming or batch jobs that read from Datastore and do some calculations on it. Cloud Dataflow can read from both Datastore and Pub/Sub queues.

If you want to interact with APIs like Pub/Sub or Datastore outside of the context of App Engine, the traditional client library is here:

https://developers.google.com/api-client-library/java/

Although there is a newer project to provide more friendly, easier to use client libraries. They are still in an early state, but you can check them out here:

https://github.com/googlecloudplatform/gcloud-java

Overall, if your current App Engine and Task Queue solution works, I would stick with it. Based on what you're telling me, the biggest change I would make is instead of your batch job polling every ten minutes, I would have the code that stores the entity in Datastore immediately kick off a Task Queue job or a Pub/Sub message that starts the background processing job.

If you're interested in where the platform is heading, you can check out some of the links here. While you can roll your own solutions on GCE, to me the more interesting parts of the platform our products like Managed VMs, and Cloud Dataflow since they allow you to solve a lot of these problems at a much higher level and save you a lot of headaches of setting up your infrastructure. However, most of these are still in a Beta stage, so they might have a few rough edges for a little bit.

If this doesn't answer your question, comment any more questions and I will try to edit in the answers. And stay tuned for a much better guide to the whole platform for Java.

Upvotes: 2

Related Questions