Malte
Malte

Reputation: 604

Google Cloud Dataflow DatastoreIO Dependency Problems for Read and Write

What I need: the right combination of Dependency-Versions to be able to Read and Write from/to Datastore in Dataflow (v. 1.9.0) via DatastoreIO.v1().read/write and which dependencies need to be referenced in pom?

Dataflow specific dependencies referenced in pom from mavenrepo for Dataflow 1.9.0:

com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-all/1.9.0
com.google.cloud.datastore/datastore-v1-protos/1.0.1
com.google.cloud.datastore/datastore-v1-proto-client/1.1.0
com.google.protobuf/protobuf-java/3.0.0-beta-1

when writing to Datastore (actually when building the Entities) I get the following exception:

// CamelExecutionException (Setup running with Camel-Routes, but for development purposes not in Fuse but as a local CamelRoute in Eclipse)
Caused by: java.lang.NoClassDefFoundError: com/google/protobuf/GeneratedMessageV3
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at com.google.datastore.v1.Value.toBuilder(Value.java:749)
    at com.google.datastore.v1.Value.newBuilder(Value.java:743)
    at xmlsource.dataflow.test.EntityUtil.getStringValue(EntityUtil.java:404)
    at xmlsource.dataflow.test.EntityUtil.getArticleEntity(EntityUtil.java:152)
    at xmlsource.dataflow.test.parser.ArticleToEntity.processElement(ArticleToEntity.java:21)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateHelper(ParDo.java:1229)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateSingleHelper(ParDo.java:1098)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.access$300(ParDo.java:457)
    at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1084)
    at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1079)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:858)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:221)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
    at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:103)
    at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:260)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:814)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:526)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:96)
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
    at xmlsource.dataflow.test.PipelineParseTest.createAndRun(PipelineParseTest.java:208)
    at xmlsource.dataflow.test.PipelineTester.process(PipelineTester.java:11)
    at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)
    ... 8 more

The referenced line in xmlsource.dataflow.test.EntityUtil.getStringValue(EntityUtil.java:404):

Value.newBuilder().setStringValue(value).build();

And when reading more or less the same:

java.lang.NoClassDefFoundError: com/google/protobuf/GeneratedMessageV3
…

When changing the dependencies to (only not the beta-version for protobuf-java)

com.google.cloud.datastore/datastore-v1-protos/1.0.1
com.google.cloud.datastore/datastore-v1-proto-client/1.1.0
com.google.protobuf/protobuf-java/3.0.0

and trying to write, following exception occurs:

// CamelExecutionException...
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    com/google/datastore/v1/Value$Builder.mergeGeoPointValue(Lcom/google/type/LatLng;)Lcom/google/datastore/v1/Value$Builder; @76: invokevirtual
  Reason:
    Type 'com/google/type/LatLng' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
  Current Frame:
    bci: @76
    flags: { }
    locals: { 'com/google/datastore/v1/Value$Builder', 'com/google/type/LatLng' }
    stack: { 'com/google/protobuf/SingleFieldBuilder', 'com/google/type/LatLng' }
  Bytecode:
    someBytecode                                    
  Stackmap Table:
    same_frame(@50)
    same_frame(@55)
    same_frame(@62)
    same_frame(@80)
    same_frame(@89)

    at com.google.datastore.v1.Value.toBuilder(Value.java:749)
    at com.google.datastore.v1.Value.newBuilder(Value.java:743)
    at xmlsource.dataflow.test.EntityUtil.getStringValue(EntityUtil.java:404)
    at xmlsource.dataflow.test.EntityUtil.getArticleEntity(EntityUtil.java:152)
    at xmlsource.dataflow.test.parser.ArticleToEntity.processElement(ArticleToEntity.java:21)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateHelper(ParDo.java:1229)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateSingleHelper(ParDo.java:1098)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.access$300(ParDo.java:457)
    at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1084)
    at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1079)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:858)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:221)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
    at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:103)
    at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:260)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:814)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:526)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:96)
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
    at xmlsource.dataflow.test.PipelineParseTest.createAndRun(PipelineParseTest.java:208)
    at xmlsource.dataflow.test.PipelineTester.process(PipelineTester.java:11)
    at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)

Here the exception references a function mergeGeoPointValue, while my code never calls any function to set LatLng or GeoPoint Values. The referenced line in my code again just sets the String-Value

When reading I have the same Exception, again when transforming the POJO to a datastore entity

Value.newBuilder().setStringValue("someString").build()

The whole Query:

Query query = Query.newBuilder()
  .addKind(KindExpression.newBuilder()
    .setName("test_article").build())
  .setFilter(Filter.newBuilder()
    .setPropertyFilter(PropertyFilter.newBuilder()
      .setProperty(PropertyReference.newBuilder()
        .setName("somePropertyName"))
        .setOp(PropertyFilter.Operator.EQUAL)
        .setValue(Value.newBuilder()
          .setStringValue("someString").build())                                
      .build())
    .build())
  .build();

Changing the dependencies to (datastore-v1-protos/1.3.0):

com.google.cloud.datastore/datastore-v1-protos/1.3.0
com.google.cloud.datastore/datastore-v1-proto-client/1.1.0
com.google.protobuf/protobuf-java/3.0.0 (or 3.2.0)

With this setup I can successfully write to Datastore via .apply(DatastoreIO.v1().write().withProjectId("someProjectId"));

When trying to read, the Query-Object is built successfully, but...:

// CamelExecutionException
Caused by: java.lang.NoSuchMethodError: com.google.datastore.v1.Query$Builder.clone()Lcom/google/protobuf/GeneratedMessage$Builder;
    at com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:648)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateHelper(ParDo.java:1229)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateSingleHelper(ParDo.java:1098)
    at com.google.cloud.dataflow.sdk.transforms.ParDo.access$300(ParDo.java:457)
    at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1084)
    at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:1079)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:858)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:221)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
    at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:103)
    at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:260)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:814)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:526)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:96)
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
    at xmlsource.dataflow.test.PipelineParseTest.createAndRun(PipelineParseTest.java:208)
    at xmlsource.dataflow.test.PipelineTester.process(PipelineTester.java:11)
    at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)
    ... 8 more

The line where I try to read from Datastore:

PCollection<Entity> entityCollection = p.apply(
  DatastoreIO.v1().read().withNamespace("test_ns_df")
    .withProjectId("someProjectId")
    .withQuery(query));

EDIT: When using the dependencies (and parent-pom) from GitHubDataflowExample I again get the java.lang.NoClassDefFoundError: com/google/protobuf/GeneratedMessageV3 when building the Value for the Query....

So I never made the read to work... Did anyone experience similar problems and found out how to solve this? Or do I need to build the values differently? The same exceptions occur when using DatastoreHelper.makeValue... The referenced Dependencies in a working project would also help a lot!

I thought this would be a dependency/version problem, but maybe someone of you knows better. It can't be, that I am the first one having those problems with java.lang.NoSuchMethodError: com.google.datastore.v1.Query$Builder.clone() like this guy NoSuchMethodError in DatastoreWordCount example who just pulled a wrong version, but on my end, this doesn't result in success.

Thanks in advance

Upvotes: 1

Views: 452

Answers (1)

Malte
Malte

Reputation: 604

Found the problem:

Due to the fact that there is a preprocess in the same project running with Camel-Fuse storing files in Google Storage, I had a dependency on google-storage:

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-storage</artifactId>
    <version>0.6.0</version>
</dependency>

This dependency was mentioned in the pom.xml BEFORE the dataflow-dependency. After switching the order of dependencies (dataflow before datastore) and removing all other dependencies, the DatastoreIO works perfectly! Then, depending on your operations (for example an XMLSource) some runtime-dependencies need to be added

Upvotes: 1

Related Questions