Reputation: 818
I see this "p | " at the beginning of beam pipelines and I do not understand what this p means.
example code of a beam pipeline from this introduction tutorial: https://beam.apache.org/documentation/sdks/python-streaming/
lines = p | 'read' >> ReadFromText(known_args.input)
...
counts = (lines
| 'split' >> (beam.ParDo(WordExtractingDoFn())
.with_output_types(six.text_type))
| 'pair_with_one' >> beam.Map(lambda x: (x, 1))
| 'group' >> beam.GroupByKey()
| 'count' >> beam.Map(count_ones))
...
output = counts | 'format' >> beam.Map(format_result)
# Write the output using a "Write" transform that has side effects.
output | 'write' >> WriteToText(known_args.output)
What I understand:
I understand the concept of p-collection, p-transforms and the aim of beam which is to treat streaming data and batched data the same way.
What I don't understand:
What is this p ? What are the parentheses for ? The pipes ? the >> ? It looks like bash style code but nowhere is it explained.
Please does someone have an explanation or a link to an actual tutorial that takes it from the start ?
Upvotes: 0
Views: 476
Reputation: 818
In this tutorial one can find a good introduction:
https://beam.apache.org/documentation/programming-guide/
Upvotes: 0
Reputation: 2680
p
is the variable to start the pipeline (p = beam.Pipeline()
), this is also referred as PBegin
.
|
separates each PTransform
(operation).
>>
is use between the |
and the PTransform
in case you want to name it.
The parentheses are there so Python doesn't complain about the multiline.
There's a set of tutorials in GCP that start from the very basics, with exercises in every "chapter". They are notebooks. To get them, you'd need to go to "Dataflow > Notebooks > Create Instance" and then, "Open Jupyterlab Notebook" and there should be a folder called Tutorials. Disclaimer, you'd need to pay for the instance hours and I was part of the team who added them.
There's also something called Katas, which is free, but I haven't gone through it thoroughly, so not sure if they start from the very beginning.
Upvotes: 3