igrigorik
igrigorik

Reputation: 9601

Best way to process a GCS file within Dataflow?

I have a PCollection of matched GCS filenames, each of which contains a single compressed JSON blob. What's the best way to read the entire file, decompress it (Gzip format), and JSON decode it?

Are there any existing APIs and/or examples that can give me a head start? Seems like this would be a pretty common use case.

Upvotes: 2

Views: 1112

Answers (1)

Sam McVeety
Sam McVeety

Reputation: 3214

This isn't natively supported in Dataflow. To accomplish reading a JSON blob out of a file, you could implement FileBasedSource:

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/FileBasedSource

If that's enough to get started, we can continue to update this answer with more information.

Upvotes: 2

Related Questions