user10540949
user10540949

Reputation:

Apache Beam won't write files to local env or Google Storage

For some reason Apache Beam won't write files to my local env or to Google Storage. My goal is to read data from Google PubSub and write it in batches to Google Storage. To do this i have the following code:

        pipeline.begin()
            .apply(PubsubIO.readStrings()
                    .fromSubscription("projects/PROJECT/subscriptions/SUBNAME"))
            .apply(ParDo.of(new UpperCaseAndPrint()))
            .apply(Window.into(FixedWindows.of(Duration.millis(1000))))
            .apply(TextIO.write().to("gs://BUCKETNAME/outputData")
                .withWindowedWrites()
                .withNumShards(1));

The ParDo function print the messages which come in and there seems to be data coming in:

19806 [direct-runner-worker] INFO  app  - message-4
19807 [direct-runner-worker] INFO  app  - message-3
19808 [direct-runner-worker] INFO  app  - message-2
19809 [direct-runner-worker] INFO  app  - message-1

Anyone has an idea why the files won't be created locally or in the Google Storage bucket?

Upvotes: 2

Views: 564

Answers (1)

user10540949
user10540949

Reputation:

So it turns out that there are some issues with the DirectRunner and using TextIO when using PubsubIO. The problem disappears when using another runner such as the Dataflow runner.

I can't really solve the local issue but i hope this helps the next person who encounters this issue and finds this post.

Upvotes: 3

Related Questions