m81
m81

Reputation: 2317

Can Oozie pause a workflow until a certain file is generated/exists?

I'm using Oozie for the first time and finding it a bit hard to parse the specification. I'm trying to create a simple workflow in which I run some queries in Hive, then execute a shell action in order to do some analysis with a different program, and then finally I'd like to execute a Java job through Oozie.

While I understand how to do all of these actions in isolation, how do set up my workflow so that the final Java job waits for a file to be generated before starting? Googling around, I see ways to make the Oozie workflow wait for a dataset to be generated before it starts, but I don't want the entire workflow to wait, as I only want one particular action within the workflow to wait for the input file to be generated.

The input file will be something simple - most likely I'll just have the second action, the shell one, execute some command like touch $(date -u "+%Y-%m-%d-%H").done right before it exits, so that my input file would be a zero-byte file with a name like 2015-07-20-14.done.

Upvotes: 0

Views: 3901

Answers (2)

srinivasan Hariharan
srinivasan Hariharan

Reputation: 365

Create a cordinator which will look for dataset in specified hdfs location on the given duration.

Sample coordinator

<coordinator-app name="FILE_CHECK" frequency="1440" start="2009-02-01T00:00Z" end="2009-02-07T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
   <datasets>
      <dataset name="datafile" frequency="60" initial-instance="2009-01-01T00:00Z" timezone="UTC">
         <uri-template>hdfs://<URI>:<PORT>/data/feed/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
      </dataset>
   </datasets>
   <input-events>
      <data-in name="coorddatafile" dataset="datafile">
          <start-instance>${coord:current(-23)}</start-instance>
          <end-instance>${coord:current(0)}</end-instance>
      </data-in>
   </input-events>
   <action>
      <workflow>
         <app-path>hdfs://<URI>:<PORT>/workflows</app-path>
      </workflow>
   </action>     
</coordinator-app>

Upvotes: 2

K S Nidhin
K S Nidhin

Reputation: 2650

You could use the decision node concept here .

Check for the file , a switch case to determine as soon u have the file execute your next java action.

----EDIT------

find below an example , This as is doesnt solve your case :

 <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
    <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <exec>echo</exec>
        <argument>my_output=Hello Oozie</argument>
        <capture-output/>
    </shell>
    <ok to="check-output"/>
    <error to="fail"/>
</action>
<decision name="check-output">
    <switch>
        <case to="end">
            ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
        </case>
        <default to="fail-output"/>
    </switch>
</decision>
<kill name="fail">
    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
    <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>

Like wise you could add your action nodes in the switch case and continue accordingly.

Upvotes: 0

Related Questions