Reputation: 1288
I am designing a text mining pipeline in UIMA DUCC as follows:
|-----------------|
| | ==CAS_1==> Pipeline A ==> Consumer A
| CAS Multiplier | ==CAS_2==> Pipeline B ==> Consumer B
| | ==CAS_3==> Pipeline C ==> Consumer C
|-----------------|
I intend to run Piepline A, B and C in parallel. I believe it can be done using flow controller. Is my unsderstanding right ? If yes, how do I define multiple CCs. The process_descriptor_CC
field in the job description file takes only one consumer. How can we pass multiple consumers and its piepline assosciation ?
Upvotes: 1
Views: 185
Reputation: 21
Firstly you need to understand the flow controller and create an aggregate descriptor using flow contoller and add cas consumer descriptor just like analysis engine descriptor in flow controller.
After this, there are two use cases for your scenario:
Use process_descriptor_CR and process_descriptor_AE only and use the flow controller based aggregate descriptor in AE.
Use process_descriptor_CR and process_dd only and use the flow controller based aggregate descriptor in deployment descriptor.
Upvotes: 1
Reputation: 123
If the intention is to process a large collection of documents with high throughput then the three pipelines, each including its CAS consumer, would all be in the AE (process_descriptor_AE) and the AE would include a custom flow controller to route CASes as desired. CASes in an AE would run one at a time, but multiple CM+AE threads could be run in parallel by specifying the number of JP threads (process_thread_count) to be greater than 1.
Upvotes: 1
Reputation: 413
make a flowcontroller and add cas consumer as delegate analysis engine. in this way you can add as many as you want. then give the path of flowcontroller in deployment descriptor and give this path in job specification.
Upvotes: 0