user3676943
user3676943

Reputation: 923

In Apache Beam, what does 'ExtractWords' do?

I am studying the Python syntax at this URL:

https://beam.apache.org/get-started/wordcount-example/#applying-pipeline-transforms

I see this syntax:

# The Flatmap transform is a simplified version of ParDo.
| 'ExtractWords' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))

What is 'ExtractWords' ?

Is it the name of a function in the beam API?

Is it a comment?

Is it some kind of annotation for the line it resides in?

Why is 'ExtractWords' there?

Upvotes: 0

Views: 47

Answers (1)

jkff
jkff

Reputation: 17913

It is a human-readable unique label for this particular transform in your pipeline. It doesn't have any meaning. It can be any string, and it is used for debugging information (e.g. if a transform fails, show you which one it was), for displaying in a UI (e.g. in the Dataflow UI), for aligning the old structure to the new structure of the pipeline when performing pipeline update, etc.

E.g.:

p | 'Read click logs' >> beam.ReadFromText(...)
  | 'Analyze user statistics' >> ...
  | 'Write statistics to my favorite BigQuery table' >> beam.io.WriteToBigQuery(...)

Upvotes: 1

Related Questions