Reputation:
For some reason Apache Beam won't write files to my local env or to Google Storage. My goal is to read data from Google PubSub and write it in batches to Google Storage. To do this i have the following code:
pipeline.begin()
.apply(PubsubIO.readStrings()
.fromSubscription("projects/PROJECT/subscriptions/SUBNAME"))
.apply(ParDo.of(new UpperCaseAndPrint()))
.apply(Window.into(FixedWindows.of(Duration.millis(1000))))
.apply(TextIO.write().to("gs://BUCKETNAME/outputData")
.withWindowedWrites()
.withNumShards(1));
The ParDo
function print the messages which come in and there seems to be data coming in:
19806 [direct-runner-worker] INFO app - message-4
19807 [direct-runner-worker] INFO app - message-3
19808 [direct-runner-worker] INFO app - message-2
19809 [direct-runner-worker] INFO app - message-1
Anyone has an idea why the files won't be created locally or in the Google Storage bucket?
Upvotes: 2
Views: 564
Reputation:
So it turns out that there are some issues with the DirectRunner
and using TextIO
when using PubsubIO
. The problem disappears when using another runner such as the Dataflow runner.
I can't really solve the local issue but i hope this helps the next person who encounters this issue and finds this post.
Upvotes: 3