Reputation: 1117
I want to schedule a oozie job based on folder i.e.
I have a folder in HDFS location and every day one file will be add in that folder with the format of date.txt (exp :20160802.txt ).
I want to schedule a OOZIE batch, if any new file add in that folder.
Please help me on this ,how can I schedule in my use case scenario.
Thanks in advance.
Upvotes: 1
Views: 928
Reputation: 1766
Oozie workflow jobs are run based on regular time intervals and/or data availability. And, in some cases, they can be triggered by an external event. Coordinator comes into play here.
You can use oozie coordinator to check data dependency and trigger a oozie workflow with Coordinator EL functions In your case every day your file is getting added to hdfs with timestamp.So with dataset you can achieve.
From Documentation
Example A dataset produced once every day at 00:15 PST8PDT and done-flag is set to empty:
<dataset name="logs" frequency="${coord:days(1)}"
initial-instance="2009-02-15T08:15Z" timezone="America/Los_Angeles">
<uri-template>
hdfs://foo:9000/app/logs/${market}/${YEAR}${MONTH}/${DAY}/data
</uri-template>
<done-flag></done-flag>
</dataset>
The dataset would resolve to the following URIs and Coordinator looks for the existence of the directory itself:
[market] will be replaced with user given property. hdfs://foo:9000/usr/app/[market]/2009/02/15/data
hdfs://foo:9000/usr/app/[market]/2009/02/16/data
hdfs://foo:9000/usr/app/[market]/2009/02/17/data
Please read the documentation many examples are given there.Its good.
2.DataSet
Upvotes: 1