Reputation: 967
I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code.
Can anyone explain what the _
, |
, and >>
are doing in the below code? Also is the text in quotes ie 'ReadTrainingData' meaningful or could it be exchanged with any other label? In other words how is that label being used?
train_data = pipeline | 'ReadTrainingData' >> _ReadData(training_data)
evaluate_data = pipeline | 'ReadEvalData' >> _ReadData(eval_data)
input_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)
_ = (input_metadata
| 'WriteInputMetadata' >> tft_beam_io.WriteMetadata(
os.path.join(output_dir, path_constants.RAW_METADATA_DIR),
pipeline=pipeline))
preprocessing_fn = reddit.make_preprocessing_fn(frequency_threshold)
(train_dataset, train_metadata), transform_fn = (
(train_data, input_metadata)
| 'AnalyzeAndTransform' >> tft.AnalyzeAndTransformDataset(
preprocessing_fn))
Upvotes: 68
Views: 9238
Reputation: 3240
No one mentioned the _
, so just for completeness:
_
, but it is taken as good practice to assign a variable that is returned but which you do not care about to _
. This makes it obvious to readers of your code that you plan to throw it away.
_
when you re-assign it (overwrite it)._
has: because it is the "throwaway" variable, most linters and other code clarity helpers treat it differently.
use_me
and never actually use it, a linter will warn that you have an unused variable. And if you have rigorous code quality restrictions, maybe you cannot even merge your code into production with an unused variable._
is not caught by the linter (and could be merged into a strict code base) because it is understood to be a throwaway variable, and therefore there is no mistake in your code (at least not in this regard).Upvotes: 2
Reputation: 1493
Operators in Python can be overloaded. In Beam, |
is a synonym for apply
, which applies a PTransform
to a PCollection
to produce a new PCollection
. >>
allows you to name a step for easier display in various UIs -- the string between the |
and the >>
is only used for these display purposes and identifying that particular application.
See https://beam.apache.org/documentation/programming-guide/#transforms
Upvotes: 88