Sai
Sai

Reputation: 1117

How to schedule OOZIE job, if any changes happened in given folder?

I want to schedule a oozie job based on folder i.e.

I have a folder in HDFS location and every day one file will be add in that folder with the format of date.txt (exp :20160802.txt ).

I want to schedule a OOZIE batch, if any new file add in that folder.

Please help me on this ,how can I schedule in my use case scenario.

Thanks in advance.

Upvotes: 1

Views: 928

Answers (1)

Taha Naqvi
Taha Naqvi

Reputation: 1766

Oozie workflow jobs are run based on regular time intervals and/or data availability. And, in some cases, they can be triggered by an external event. Coordinator comes into play here.

You can use oozie coordinator to check data dependency and trigger a oozie workflow with Coordinator EL functions In your case every day your file is getting added to hdfs with timestamp.So with dataset you can achieve.

From Documentation

Example A dataset produced once every day at 00:15 PST8PDT and done-flag is set to empty:

  <dataset name="logs" frequency="${coord:days(1)}"
           initial-instance="2009-02-15T08:15Z" timezone="America/Los_Angeles">
    <uri-template>
      hdfs://foo:9000/app/logs/${market}/${YEAR}${MONTH}/${DAY}/data
    </uri-template>
    <done-flag></done-flag>
  </dataset>
The dataset would resolve to the following URIs and Coordinator looks for the existence of the directory itself:

  [market] will be replaced with user given property.  hdfs://foo:9000/usr/app/[market]/2009/02/15/data
  hdfs://foo:9000/usr/app/[market]/2009/02/16/data
  hdfs://foo:9000/usr/app/[market]/2009/02/17/data

Please read the documentation many examples are given there.Its good.

1.About Coordinators

2.DataSet

Upvotes: 1

Related Questions