Reputation: 880
I am new to GCP Dataflow, just wanted to understand if there is any way to print all values of PCollection.
Pipeline p = Pipeline.create(options);
PCollection<String> lines = p.apply("ReadLines", TextIO.read().from(options.getInputFile()));
Here, I want to print and check all values available in lines(PCollection)
Similarly, want to access all values in words after below operation
PCollection<String> words = lines.apply(
FlatMapElements.into(TypeDescriptors.strings())
.via((String line) -> Arrays.asList(line.split(" "))));
Upvotes: 2
Views: 4344
Reputation: 1528
In your main function
p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
.apply("Print",ParDo.of(new PrintElementFn()))
Write a static class
private static class PrintElementFn extends DoFn<String,Void>{
@ProcessElement
public void processElement(@Element String input){
System.out.println(input);
}
}
Upvotes: 3
Reputation: 1317
You will need to process the PCollection in a ParDo. See docs here. Within the ParDo you can inspect each element.
Upvotes: 2