Reputation: 13
Im new to GCP and needed some help on the following: I have a .json file uploaded to cloud storage and need to move the data into cloud datastore for parsing/queries.
I think a large dataset may take too long to import natively, so was interesting in using dataflow to transform and load. Any ideas or help would be much appreciated.
Upvotes: 1
Views: 3209
Reputation: 727
This is a fairly straightforward problem. You'll need to:
Review the basics of writing dataflow pipelines here: https://beam.apache.org/documentation/pipelines/design-your-pipeline/
Read from GCS: https://beam.apache.org/documentation/sdks/javadoc/0.2.0-incubating/org/apache/beam/sdk/io/TextIO.html
Transform JSON to entities: https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/coders/TableRowJsonCoder (or similar)
Write to Datastore https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore
Hope this helps!
Upvotes: 3