CloudGalore
CloudGalore

Reputation: 11

Error Launching CloudDataFlow Java App using Cloud Composer

Am a GCP Newbie and facing an error when trying to run a cloud data flow app for the BeamTutorial using GCP Cloud Composers DataflowJavaOperator. Airflow picks up the pipeline but fails with the below error.

gcp_dataflow_hook.py:115} INFO - Running command: java -cp /tmp/dataflow13ec2a50-BeamTutorial-0.0.1-SNAPSHOT.jar org.apache.beam.examples.tutorial.game.solution.Exercise2 --runner=DataflowRunner --project=..... --region=us-central1 --labels={"airflow-version":"v1-9-0-composer"} --jobName=run-beam-data-flow-java-1449a1da --outputPrefix=gs://..../ex2-spark/out
gcp_dataflow_hook.py:127} WARNING - Error: A JNI error has occurred, please check your installation and try again
[2018-10-18 09:35:00,316] {base_task_runner.py:98} INFO - Subtask: Exception in thread "main" java.lang.NoClassDefFoundError:org/apache/beam/sdk/options/PipelineOptions

This BeamTutorial-0.0.1-SNAPSHOT.jar is not a fat jar and runs the job successfully in Dataflow when submitted manually from gcp cloud shell manually as below

mvn compile exec:java -Dexec.mainClass="org.apache.beam.examples.tutorial.game.solution.Exercise2" -Dexec.args="--runner=dataflow --project=<project-name> --outputPrefix=gs://..../beam-tutorial/ex2-spark/out" -Pdataflow-runner

Appreciate any help in fixing this error. thank you.

Upvotes: 1

Views: 1909

Answers (1)

VictorGGl
VictorGGl

Reputation: 1916

When using the DataFlowJavaOperator you need to follow instructions here on how to create your ".jar" file:

  • Add the dependency and plugin from link
  • Run mvn package to create your ".jar" file

Once you do that I'd advise to make sure that the ".jar" file is actually running correctly before trying to run it inside Composer. So in this case following the tutorial, running:

java -jar target/BeamTutorial-0.0.1-SNAPSHOT.jar   --runner=DataflowRunner   --p
roject=<my-project>   --tempLocation=<my-bucket>

I also get:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/beam/sdk/options/PipelineOptions
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.beam.sdk.options.PipelineOptions
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 7 more

So the issue looks more Java-related and how the pom is configured that is not creating a valid .jar file, or it is expecting some additional parameters. In any case you should troubleshoot the ".jar"/pom before going further.

For some other pipelines I have I ran them successfully using the DataflowJavaOperator and a valid ".jar" file.

Upvotes: 2

Related Questions