Reputation: 21
to read the file directly from HDFS without copying it to the local file system. i copied the results to the local file system though.
hduser@ubuntu:/usr/local/hadoop$ mkdir /tmp/gutenberg-output bin/hadoop dfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.
20/11/17 21:58:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable the linux answer is getmerge: `/tmp/gutenberg-output': Is a directory how to fix the error please?
Upvotes: 1
Views: 309
Reputation: 1397
You seem to try to output this particular HDFS directory itself instead of the contents inside of it.
The good thing about the HDFS, though, is that it does follow a couple of Unix-based command line conventions, so you can really read the contents of a file under this directory (which you supposedly have the output of a job) by using the cat
command like this:
hadoop fs -cat output_directory/part-r-00000
Where the output_directory
is the name of the directory your desired output is stored and the part-r-00000
is the name of the file (or the first of a set of files named part-r-00000
, part-r-00001
, etc. depending on the number of your job's reducers that you might define) with the results of the job.
If the above command throws an error that there's no such file with that name, then either your job has stumbled upon a problem before setting the output key-value
pairs, or your version of Hadoop is a bit older and the name of the output file(s) is something like part-00000
, part-00001
and it goes on.
As an example of the, the following output in the screenshot below is from an executed job where its output was stored under the wc_out
directory in the HDFS:
Upvotes: 0