Reputation: 117
We have a requirement to create separate threads for reading multiple files.
PCollection<String>
. Can I execute a Pardo Operation
in a multithreaded environment. and create a PCollection < String,String >
from PCollection< String >
?Could you please tell whether this is possible and it is a recommended approach?
Upvotes: 3
Views: 2665
Reputation: 11031
It sounds like what you want can be done with Beam. In the Beam model, you do not define how you want your operations to run, but rather, what operations you want to perform; then Beam, and the underlying runner takes care of managing threads.
That's why you generally shouldn't manage your own threads to read files in Beam. You should use TextIO
to read from plain text files, and the TextIO
module should read the files in parallel.
There are a few cases when your files will not be able to be read in parallel:
TextIO.readAll
instead of the normal TextIO
implementation, because keeping track of thousands of files that are being read in parallel can overwhelm the system.Let me know if you are using non-plain text files, or other kind of source.
Upvotes: 2