Apache Beam I/O Transforms

Question

The Apache Beam documentation Authoring I/O Transforms - Overview states:

Reading and writing data in Beam is a parallel task, and using ParDos, GroupByKeys, etc… is usually sufficient. Rarely, you will need the more specialized Source and Sink classes for specific features.

Could someone please provide a very basic example of how to do this in Python?

For example, if I had a local folder containing 100 jpeg images, how would I:

Use ParDos to read/open the files.
Run some arbitrary code on the images (maybe convert them to grey-scale).
Use ParDos to write the modified images to a different local folder.

Thanks,

Apache Beam I/O Transforms

Answers (1)

Related Questions