Reputation: 21

Retrieve the job result from HDFS

to read the file directly from HDFS without copying it to the local file system. i copied the results to the local file system though.

hduser@ubuntu:/usr/local/hadoop$ mkdir /tmp/gutenberg-output bin/hadoop dfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.

20/11/17 21:58:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable the linux answer is getmerge: `/tmp/gutenberg-output': Is a directory how to fix the error please?

Upvotes: 1

Answers (1)

Coursal

Reputation: 1397

You seem to try to output this particular HDFS directory itself instead of the contents inside of it.

The good thing about the HDFS, though, is that it does follow a couple of Unix-based command line conventions, so you can really read the contents of a file under this directory (which you supposedly have the output of a job) by using the cat command like this:

hadoop fs -cat output_directory/part-r-00000

Where the output_directory is the name of the directory your desired output is stored and the part-r-00000 is the name of the file (or the first of a set of files named part-r-00000, part-r-00001, etc. depending on the number of your job's reducers that you might define) with the results of the job.

If the above command throws an error that there's no such file with that name, then either your job has stumbled upon a problem before setting the output key-value pairs, or your version of Hadoop is a bit older and the name of the output file(s) is something like part-00000, part-00001 and it goes on.

As an example of the, the following output in the screenshot below is from an executed job where its output was stored under the wc_out directory in the HDFS:

Upvotes: 0

Retrieve the job result from HDFS

Answers (1)

Related Questions