greperror
greperror

Reputation: 5676

How to make oozie job get triggered when a success file is present in aws S3

I am working with oozie to perform a HDFS data transfer operation, the requirement is to trigger oozie workflow job whenever there is data available in aws S3 bucket. I am thinking of keeping a success file along with data files in my S3 bucket, but I am not sure, how to make oozie coordinator periodically read from S3 to check if the success file is available or not. It would be great, if somebody can provide the sample coordinator.xml for the same.

Upvotes: 1

Views: 2147

Answers (1)

Deepan Ram
Deepan Ram

Reputation: 850

Can you try out the below :-

<coordinator-app name="FILE_CHECK" frequency="1440" start="2017-04-17T00:00Z" end="2018-04-17T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">

<datasets>
      <dataset name="datafile" frequency="60" initial-instance="2017-04-16T00:00Z" timezone="UTC">
         <uri-template>s3n://mybucket/a/b/${YEAR}/${MONTH}/${DAY}</uri-template>
         <done-flag><flag to check></done-flag>
      </dataset>
   </datasets>
   <input-events>
      <data-in name="coorddatafile" dataset="datafile">
          <instance>${coord:current(0)}</start-instance>
      </data-in>
   </input-events>
   <action>
      <workflow>
         <app-path><workflow_path></app-path>
          <configuration>
                <property>
                    <name>fileDirectory</name>
                    <value>${coord:dataIn('coorddatafile')}</value>
                </property>
          </configuration>
      </workflow>
   </action>     
</coordinator-app>

You can also refer : -https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Getting-Oozie-Coordinator-datasets-working-with-S3-after-a-lost/td-p/27233

Upvotes: 1

Related Questions