sri_sankl
sri_sankl

Reputation: 243

How to get Output data from hadoop?

I have created jar that runs the mapReduce and generates the output at some directory. I need to read data from output data from output dir from my java code which not runs in hadoop environment without copying it into local directory. I am using ProcessBuilder to run Jar.can any one help me..??

Upvotes: 1

Views: 318

Answers (2)

Tariq
Tariq

Reputation: 34184

What's the problem in reading HDFS data using HDFS API??

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/mapout/input.txt"));
        System.out.println(inputStream.readLine());     
    }

Your program might be running out of your hadoop cluster but hadoop daemons must be running.

Upvotes: 1

Magham Ravi
Magham Ravi

Reputation: 603

You can write the following code to read the output of the job within your MR driver code.

    job.waitForCompletion(true);
    FileSystem fs = FileSystem.get(conf);
    Path[] outputFiles = FileUtil.stat2Paths(fs.listStatus(output,new  OutputFilesFilter()));

        for (Path file : outputFiles ) {
            InputStream is = fs.open(file);
            BufferedReader reader = new BufferedReader(new InputStreamReader(is));
            ---
            ---
        }

Upvotes: 1

Related Questions