frazman
frazman

Reputation: 33293

view contents of file in hdfs hadoop

Probably a noob question but is there a way to read the contents of file in hdfs besides copying to local and reading thru unix?

So right now what I am doing is:

  bin/hadoop dfs -copyToLocal hdfs/path local/path

  nano local/path

I am wondering if I can open a file directly to hdfs rather than copying it on local and then opening it.

Upvotes: 42

Views: 113159

Answers (7)

DevPete
DevPete

Reputation: 19

I was trying to figure out the above commands and that didnt work for me to read the file. But this did,

cat <filename>

For example,

cat data.txt

Upvotes: -1

0xc0de
0xc0de

Reputation: 8307

I usually use

$ hdfs dfs -cat <filename> | less

This also helps me to search for words to find what I'm interested in while looking at the contents.

For less context irrelevant purposes like knowing if a particular word exists in a file, or count word occurrences, I use.

$ hdfs dfs -cat <filename> | grep <search_word>

Note: grep also have -C option for contexts, with -A and -B for lines after/before the match.

Upvotes: 1

Manish Barnwal
Manish Barnwal

Reputation: 281

If the file size is huge (which will be the case most of the times), by doing 'cat' you don't want to blow up your terminal by throwing the entire content of your file. Instead, use piping and get only few lines of the file.

To get the first 10 lines of the file, hadoop fs -cat 'file path' | head -10

To get the last 5 lines of the file, hadoop fs -cat 'file path' | tail -5

Upvotes: 18

Kyle Bridenstine
Kyle Bridenstine

Reputation: 6393

  1. SSH onto your EMR cluster ssh hadoop@emrClusterIpAddress -i yourPrivateKey.ppk
  2. Run this command /usr/lib/spark/bin/spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://yourEmrClusterIpAddress:8020/eventLogging --class org.apache.spark.examples.SparkPi --master yarn --jars /usr/lib/spark/examples/jars/spark-examples_2.11-2.4.0.jar
  3. List the contents of that directory we just created which should now have a new log file from the run we just did

    [hadoop@ip-1-2-3-4 bin]$ hdfs dfs -ls /eventLogging Found 1 items -rwxrwx--- 1 hadoop hadoop 53409 2019-05-21 20:56 /eventLogging/application_1557435401803_0106

  4. Now to view the file run hdfs dfs -cat /eventLogging/application_1557435401803_0106

Resources: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html

Upvotes: 1

Mhamad El Itawi
Mhamad El Itawi

Reputation: 254

If you are using hadoop 2.x , you can use

hdfs dfs -cat <file>

Upvotes: 6

cool.ernest.7
cool.ernest.7

Reputation: 93

hadoop dfs -cat <filename>  or    hadoop dfs -cat <outputDirectory>/*

Upvotes: 4

Quetzalcoatl
Quetzalcoatl

Reputation: 3067

I believe hadoop fs -cat <file> should do the job.

Upvotes: 72

Related Questions