Reputation: 5676
I am working with oozie to perform a HDFS data transfer operation, the requirement is to trigger oozie workflow job whenever there is data available in aws S3 bucket. I am thinking of keeping a success file along with data files in my S3 bucket, but I am not sure, how to make oozie coordinator periodically read from S3 to check if the success file is available or not. It would be great, if somebody can provide the sample coordinator.xml for the same.
Upvotes: 1
Views: 2147
Reputation: 850
Can you try out the below :-
<coordinator-app name="FILE_CHECK" frequency="1440" start="2017-04-17T00:00Z" end="2018-04-17T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
<datasets>
<dataset name="datafile" frequency="60" initial-instance="2017-04-16T00:00Z" timezone="UTC">
<uri-template>s3n://mybucket/a/b/${YEAR}/${MONTH}/${DAY}</uri-template>
<done-flag><flag to check></done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="coorddatafile" dataset="datafile">
<instance>${coord:current(0)}</start-instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path><workflow_path></app-path>
<configuration>
<property>
<name>fileDirectory</name>
<value>${coord:dataIn('coorddatafile')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
You can also refer : -https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Getting-Oozie-Coordinator-datasets-working-with-S3-after-a-lost/td-p/27233
Upvotes: 1