Nagesh Singh Chauhan
Nagesh Singh Chauhan

Reputation: 784

Automate the already build google dataflow pipeline in eclipse

I have created a dataflow pipeline in java using eclipse, also I have the jar file of my pipeline application kept in google storage.

My requirement is to automate the whole process, As per my understanding this can be done by creating a cron job or by creating a template. Can anyone provide a better understanding about how it can be done ?

EDIT : getting error in StarterPipeline.run();
ArtifactServlet.java

package my.proj;
import java.io.IOException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServletResponse;

@WebServlet(name = "ArtifactServlet", value = "/home/support/Ad-eff")
public class ArtifactServlet extends HttpServlet {

@Override
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
StarterPipeline.run();
}
}

Upvotes: 0

Views: 550

Answers (1)

Lefteris S
Lefteris S

Reputation: 1672

This article is a nice source on how to schedule Dataflow pipelines, either with the App Engine Cron Service or Cloud Functions. It is a bit outdated, as Cloud Functions were in alpha at the time it was published (they are now in beta), but it should still work ok.

App Engine cron job

An App Engine cron job invokes a URL defined as part of your App Engine app via HTTP GET. Due to Dataflow pipeline execution requirements you will need to do what you are looking for in the flex environment. Here are the steps you need to take:

  1. Create a Servlet that calls the pipeline code and deploy it to App Engine.
  2. Create a cron.yaml file to configure App Engine Cron Service to call the Servlet’s URL at a regular interval.
  3. Deploy the cron job to App Engine.

Cloud Functions

With Cloud Functions you write Node.js functions which respond to a number of different events/triggers such as Pub/Sub messages, Cloud Storage changes and HTTP invocations. So, you can write a Cloud Function executing a Dataflow pipeline that can have any of these Cloud Function triggers to kickstart the Dataflow pipeline.

  1. Create a Node.js module for your Cloud Function and call your jar from it (for example using spawn). You could use the module provided in this link as a basis. Note that you will need to provide your own Java runtime folder along with the Cloud Function code. Make sure that your Node.js module is named as index.js and all Java dependencies are in the same folder.
  2. Deploy your function.
  3. Schedule triggering. The most reliable way to do this would be again to use the App Engine cron service (could be either standard or flex). So, you could for example deploy your function with a Pub/Sub trigger, running from a shell something like: gcloud beta functions deploy myFunction --trigger-resource my-topic --trigger-event google.pubsub.topic.publish. You can then create a Servlet that publishes an empty message to my-topic. From that point on, you would have to follow steps 2 and 3 from the App Engine cron job solution description above.

Upvotes: 3

Related Questions