Ben Watson
Ben Watson

Reputation: 5531

Accessing and manipulating the date in Oozie

I have an Oozie workflow which requires the ability to use different date formats. For example, assume I'm running the workflow on 16th January 2015 using a property runDate=20150116 in job.properties. I'd want to be able to automatically use the following paths in Oozie actions:

external-file-20150116.csv

and some other data named:

/rootDir/resource/150116/*

The first example would be easy enough, I'd simply refer to:

external-file-${runDate}.csv

but the second example wouldn't be possible.

I can only find Oozie's in-built EL timestamp() function which is no use as it's a fixed format and offers no manipulation. It seems as though using a coordinator would solve the problem as I'd be able to use all of the nice coord EL functions. However I'll need to run this workflow occasionally on an ad-hoc basis, in which case I'd be using a job.properties file and not a coordinator.

Any suggestions as to how I can manipulate dates without using a coordinator?

Upvotes: 2

Views: 2241

Answers (2)

Ben Watson
Ben Watson

Reputation: 5531

After lots of messing around and research, I've found the following solution. Unlike the other answer it does not require inserting one variable per required date format into the job. My solution is based on using an EL function - basically a UDF but for Oozie.

Solution

Create an EL function to allow dates to have their formats modified. EL functions are written in Java, and unlike Hive UDFs do not require any class extension, although any methods that will be called by Oozie should be static.

The code for this method is:

package org.watsonb.elfunctions;
import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
public class DateEL {
    public static String convertDate(String inputDate, String inputDateFormat, String outputDateFormat) {
        DateTimeFormatter formatter = DateTimeFormat.forPattern(inputDateFormat);
        DateTime dateTime = formatter.parseDateTime(inputDate);
        formatter = DateTimeFormat.forPattern(outputDateFormat);
        return formatter.print(dateTime);
    }
}

Build this class, and place the generated jar file in /var/lib/oozie on the Oozie server box.

On Ambari's Oozie config page, create or find the oozie.service.ELService.ext.functions.workflow property in the Custom oozie-site.xml tab, and add the following (if it already exists, separate each function declaration with a comma):

convertDateEl=org.watsonb.elfunctions.DateEL#convertDate

In this example:

  • convertDateEl is the name of the function that will be called within Oozie workflows,
  • org.watsonb.elfunctions.DateEL is the full class path,
  • convertDate is the name of the method in the class.

If not using Ambari, add the property to oozie-site.xml.

Restart the Oozie service. The function is now available to any Oozie workflow.

Usage

Inside a workflow, call:

${convertDateEl(runDate, "yyyyMMdd", "yy-MM-dd")}

to return a formatted date. For example:

<arg>/output/telephone-records-${convertDate(runDate, "yyyyMMdd", "yy-MM-dd")}.csv</arg>

would, at runtime, turn into:

<arg>/output/telephone-records-12-09-30.csv</arg>

if runDate is 20120930.

Source: http://blog.cloudera.com/blog/2013/09/how-to-write-an-el-function-in-apache-oozie/ - I found this useful but a bit too verbose.

Upvotes: 1

zhutoulala
zhutoulala

Reputation: 4832

There are 3 ways to input an oozie job property.

  1. Job properties file, submitted by oozie user when running a job
  2. config-default.xml, under the same HDFS path as workflow.xml(or coordinator.xml / bundle.xml). Usually used for specifying default job properties.
  3. -D option from oozie CLI.

In your use case, you may add something like below to the oozie commandline

-DrunDate=`date +%Y%m%d`

Upvotes: 1

Related Questions