Q Boiler
Q Boiler

Reputation: 1227

Oozie Workflow with Archive Action

I would like to make an oozie workflow where the final step of success would be to "Archive" the results.

The command in the shell to do it is

hadoop archive -archiveName=XXX.har -p /some/random/parent directorToArhive pathToArchiveDestination

I have tried the following

<workflow-app name="HARD_CODED_ARCHIVE_TEST" xmlns="uri:oozie:workflow:0.4">

    <start to="archive"/>
    <action name="archive">
        <archive archiveName="xxx.har" src="/root/src/dir" dest="/path/to/desired/archive/location"/>
        <ok to="end"/>
        <error to="kill"/>
    </action>

    <kill name="kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

The Error I get is something like the following:

WARNING: Exception in Runloop of thread: main with message: E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'archive'. One of '{"uri:oozie:workflow:0.4":map-reduce, "uri:oozie:workflow:0.4":pig, "uri:oozie:workflow:0.4":sub-workflow, "uri:oozie:workflow:0.4":fs, "uri:oozie:workflow:0.4":java, WC[##other:"uri:oozie:workflow:0.4"]}' is expected.

So it is very clear that I can't do this. because the oozie workflow schema does not support the "archive" action.

I really don't want to run this via a cron as I would like to archive immediately after a workflow completes successfully how do I do this.

Upvotes: 0

Views: 1777

Answers (1)

user1569891
user1569891

Reputation: 46

Try this:

<action name="archive"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <main-class>org.apache.hadoop.tools.HadoopArchives</main-class> <arg>-archiveName</arg> <arg>${YourArchiveName}.har</arg> <arg>-p</arg> <arg>${FilesParentDirectory}</arg> <arg>${SrcDirectory}</arg> <arg>${DestDirectory}</arg> </java> <ok to="end"/> <error to="error"/> </action>

All you need is the hadoop-archives.jar file in your workflow. Also don't forget to put the jar in your workflow directory and you should be good to go. Hope that helps!

Upvotes: 3

Related Questions