Shulhi Sapli
Shulhi Sapli

Reputation: 2476

Writing to file from jar run from Oozie shell

I have jar file that needs to be run before running our map reduce process. This is going to process the data to be fed in later to the map reduce process. The jar file works fine without oozie, but I like to automate the workflow.

The jar if runs should accept two inputs: <input_file> and <output_dir> And it should be expected to output two files <output_file_1>, <output_file_2> under the <output_dir> specified.

This is the workflow:

<workflow-app name="RI" xmlns="uri:oozie:workflow:0.4">
    <start to="RI"/>
    <action name="RI">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>java </exec>
              <argument>-jar</argument>
              <argument>RI-Sequencer.jar </argument>
              <argument>log.csv</argument>
              <argument>/tmp</argument>
            <file>/user/root/algo/RI-Sequencer.jar#RI-Sequencer.jar</file>
            <file>/user/root/algo/log.csv#log.csv</file>
              <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

I run the task using Hue, and currently I can't get the output of the process to be written to files. It runs fine, but the supposed files are no where to be found.

I have also changed the output directory to be in HDFS, but with same result, no files are generated.

If it helps, this is sample of codes from my jar file:

File fileErr = new File(targetPath + "\\input_RI_err.txt");
fileErr.createNewFile();
textFileErr = new BufferedWriter(new FileWriter(fileErr));
// 
// fill in the buffer with the result
//
textFileErr.close();

UPDATE: If it helps, I can upload the jar file for testing.

UPDATE 2: I've changed to make it write to HDFS. Still not working when using Oozie to execute the job. Running the job independently works.

Upvotes: 0

Views: 2128

Answers (2)

Radek Tomšej
Radek Tomšej

Reputation: 490

I do not understand why are you want to preprocess data before mapreduce. Think it is not too effective. But as Roamin said, you are saving your output file into local filesystem (file should be in your user home folder ~/). If you want to save your data into hdfs directly from java (without using mapreduce library) look here - How to write a file in HDFS using hadoop or Write a file in hdfs with java.

Eventually you can generate your file to local directory and then load it into HDFS with this command:

hdfs dfs -put <localsrc> ... <dst>

Upvotes: 0

Romain
Romain

Reputation: 7082

It seems like you are creating a regular output file (on the local filesystem, not HDFS). As the job is going to run on one of the node of the cluster, the output is going to be on the local /tmp of the machine picked.

Upvotes: 2

Related Questions