Reputation: 11
Am a GCP Newbie and facing an error when trying to run a cloud data flow app for the BeamTutorial using GCP Cloud Composers DataflowJavaOperator. Airflow picks up the pipeline but fails with the below error.
gcp_dataflow_hook.py:115} INFO - Running command: java -cp /tmp/dataflow13ec2a50-BeamTutorial-0.0.1-SNAPSHOT.jar org.apache.beam.examples.tutorial.game.solution.Exercise2 --runner=DataflowRunner --project=..... --region=us-central1 --labels={"airflow-version":"v1-9-0-composer"} --jobName=run-beam-data-flow-java-1449a1da --outputPrefix=gs://..../ex2-spark/out
gcp_dataflow_hook.py:127} WARNING - Error: A JNI error has occurred, please check your installation and try again
[2018-10-18 09:35:00,316] {base_task_runner.py:98} INFO - Subtask: Exception in thread "main" java.lang.NoClassDefFoundError:org/apache/beam/sdk/options/PipelineOptions
This BeamTutorial-0.0.1-SNAPSHOT.jar is not a fat jar and runs the job successfully in Dataflow when submitted manually from gcp cloud shell manually as below
mvn compile exec:java -Dexec.mainClass="org.apache.beam.examples.tutorial.game.solution.Exercise2" -Dexec.args="--runner=dataflow --project=<project-name> --outputPrefix=gs://..../beam-tutorial/ex2-spark/out" -Pdataflow-runner
Appreciate any help in fixing this error. thank you.
Upvotes: 1
Views: 1909
Reputation: 1916
When using the DataFlowJavaOperator you need to follow instructions here on how to create your ".jar" file:
mvn package
to create your ".jar" fileOnce you do that I'd advise to make sure that the ".jar" file is actually running correctly before trying to run it inside Composer. So in this case following the tutorial, running:
java -jar target/BeamTutorial-0.0.1-SNAPSHOT.jar --runner=DataflowRunner --p
roject=<my-project> --tempLocation=<my-bucket>
I also get:
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/beam/sdk/options/PipelineOptions
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.beam.sdk.options.PipelineOptions
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
So the issue looks more Java-related and how the pom is configured that is not creating a valid .jar file, or it is expecting some additional parameters. In any case you should troubleshoot the ".jar"/pom before going further.
For some other pipelines I have I ran them successfully using the DataflowJavaOperator and a valid ".jar" file.
Upvotes: 2