Reputation: 1092
I have a simple input file with 2 columns like
pkg1 date1
pkg2 date2
pkg3 date3
...
...
I want to create a oozie workflow which will process each row separately . For each row, I want to run multiple Actions one after another(Hive,Pig..) and then process another row. But it is more difficult than I expected. I think, I have to create a loop somehow and iterate through it.
Can you give me architectural advise how I can achieve this?
Upvotes: 0
Views: 62
Reputation: 9067
I totally agree with @Mattinbits, you must use some procedural code (shell script, Python, etc) to run the loop and fire the appropriate Pig/Hive tasks.
But if your process must wait for the tasks to complete before launching the next batch, the coordination part might become a bit more complicated to implement. I can think of a very evil way to use Oozie for that coordination...
Of course there are some other things to take care of -- generating unique names for sub-workflows Actions, chaining them, handling errors. The usual stuff.
Upvotes: 1
Reputation: 10428
Oozie does not support loops/cycles, since it is a Directed Acyclic Graph
https://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a2.1_Cycles_in_Workflow_Definitions
Also, there is no inbuilt way (that I'm aware of) to read data from Hive into an Oozie workflow and use it to control the flow of the Oozie workflow.
You could have a single Oozie workflow which launches some custom process (e.g. a Shell Action), and within that process read the data from Hive, and launch a new, separate, Oozie workflow for each entry.
Upvotes: 1