RKP
RKP

Reputation: 880

GCP Dataflow : print PCollection data

I am new to GCP Dataflow, just wanted to understand if there is any way to print all values of PCollection.

Pipeline p = Pipeline.create(options);
PCollection<String> lines = p.apply("ReadLines", TextIO.read().from(options.getInputFile()));

Here, I want to print and check all values available in lines(PCollection)

Similarly, want to access all values in words after below operation

PCollection<String> words = lines.apply(
            FlatMapElements.into(TypeDescriptors.strings())
                    .via((String line) -> Arrays.asList(line.split(" "))));

Upvotes: 2

Views: 4344

Answers (2)

sunitha
sunitha

Reputation: 1528

In your main function

p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
 .apply("Print",ParDo.of(new PrintElementFn()))

Write a static class

private static class PrintElementFn extends DoFn<String,Void>{
        @ProcessElement
        public void processElement(@Element String input){
            System.out.println(input);
        }
    }

Upvotes: 3

Eric Schmidt
Eric Schmidt

Reputation: 1317

You will need to process the PCollection in a ParDo. See docs here. Within the ParDo you can inspect each element.

Upvotes: 2

Related Questions