Reputation: 690
Is there a way to read any file format from HDFS directly by using the HDFS path, instead of having to pull the file locally from HDFS and read it.
Upvotes: 3
Views: 26577
Reputation: 191894
You have to pull the entire file. Whether you use cat
or text
commands, the entire file is still being streamed to your shell. There's just no remnant of the file when the command ends. So, if you plan on inspecting the file a few times, it's better to get
it
As an hdfs client, you must contact the namenode to acquire all block locations for a particular file.
Upvotes: 2
Reputation: 10092
You can use cat
command on HDFS to read regular text files.
hdfs dfs -cat /path/to/file.csv
To read compressed files like gz, bz2
etc, you can use:
hdfs dfs -text /path/to/file.gz
These are the two read methods that Hadoop supports natively using FsShell
comamnds. For other complex file types, you will have to use a more complex way, like, a Java program or something along those lines.
Upvotes: 6
Reputation: 563
You can try with hdfs dfs -cat
Usage: hdfs dfs -cat [-ignoreCrc] URI [URI ...]
hdfs dfs -cat /your/path
Upvotes: 2