Subhradip Bose
Subhradip Bose

Reputation: 3305

Want to run Apache Beam Pipeline in parallel

My problem statement is

1. Need to fetch data from multiple third party source / perform some operation / store the data in some location

2. I need to create a dedicated Beam pipeline for each source

As i am new to Beam , my question is

1. If i create separate pipelines for different third party source , will it be good or it can cause some problem ?

2. If the design is right , then if I run with run beam-runners-direct-java in a single machine , will it act like a parallel processing ?

Upvotes: 0

Views: 1497

Answers (1)

Ruoyun Huang
Ruoyun Huang

Reputation: 173

Beam has an ultimate plan of supporting many different sources (and eventually they can be even cross languages).

to your questions, Multiple beam-runner-direct-java in parallel on the single machine won't cause problem. In fact, all the validation tests uses direct runner and the tests do run in parallel.

One thing unclear is, what is the main reason that you have to create multiple pipelines, one for each 3rd party source? if the reason is to have things run parallel for higher throughput, I (biased opinion) think that is not a good idea. In the long run, even if we introduce feature optimizing parallel sources, you won't be able to benefit from the opt.

Upvotes: 2

Related Questions