Reputation: 923
I am studying the Python syntax at this URL:
https://beam.apache.org/get-started/wordcount-example/#applying-pipeline-transforms
I see this syntax:
# The Flatmap transform is a simplified version of ParDo.
| 'ExtractWords' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
What is 'ExtractWords' ?
Is it the name of a function in the beam API?
Is it a comment?
Is it some kind of annotation for the line it resides in?
Why is 'ExtractWords' there?
Upvotes: 0
Views: 47
Reputation: 17913
It is a human-readable unique label for this particular transform in your pipeline. It doesn't have any meaning. It can be any string, and it is used for debugging information (e.g. if a transform fails, show you which one it was), for displaying in a UI (e.g. in the Dataflow UI), for aligning the old structure to the new structure of the pipeline when performing pipeline update, etc.
E.g.:
p | 'Read click logs' >> beam.ReadFromText(...)
| 'Analyze user statistics' >> ...
| 'Write statistics to my favorite BigQuery table' >> beam.io.WriteToBigQuery(...)
Upvotes: 1