Yannick Pezeu
Yannick Pezeu

Reputation: 818

What is the " p " in beam please?

I see this "p | " at the beginning of beam pipelines and I do not understand what this p means.

example code of a beam pipeline from this introduction tutorial: https://beam.apache.org/documentation/sdks/python-streaming/

lines = p | 'read' >> ReadFromText(known_args.input)
 ​...

 ​counts = (lines
           ​| 'split' >> (beam.ParDo(WordExtractingDoFn())
                         ​.with_output_types(six.text_type))
           ​| 'pair_with_one' >> beam.Map(lambda x: (x, 1))
           ​| 'group' >> beam.GroupByKey()
           ​| 'count' >> beam.Map(count_ones))
 ​...

 ​output = counts | 'format' >> beam.Map(format_result)

 ​# Write the output using a "Write" transform that has side effects.
 ​output | 'write' >> WriteToText(known_args.output)

What I understand:

I understand the concept of p-collection, p-transforms and the aim of beam which is to treat streaming data and batched data the same way.

What I don't understand:

What is this p ? What are the parentheses for ? The pipes ? the >> ? It looks like bash style code but nowhere is it explained.

Please does someone have an explanation or a link to an actual tutorial that takes it from the start ?

Upvotes: 0

Views: 476

Answers (2)

Yannick Pezeu
Yannick Pezeu

Reputation: 818

In this tutorial one can find a good introduction:

https://beam.apache.org/documentation/programming-guide/

Upvotes: 0

Iñigo
Iñigo

Reputation: 2680

p is the variable to start the pipeline (p = beam.Pipeline()), this is also referred as PBegin.

| separates each PTransform (operation).

>> is use between the | and the PTransform in case you want to name it.

The parentheses are there so Python doesn't complain about the multiline.

There's a set of tutorials in GCP that start from the very basics, with exercises in every "chapter". They are notebooks. To get them, you'd need to go to "Dataflow > Notebooks > Create Instance" and then, "Open Jupyterlab Notebook" and there should be a folder called Tutorials. Disclaimer, you'd need to pay for the instance hours and I was part of the team who added them.

There's also something called Katas, which is free, but I haven't gone through it thoroughly, so not sure if they start from the very beginning.

Upvotes: 3

Related Questions