Reputation: 170

How can I access hdfs on hadoop-yarn application from each node?

How do I access HDFS and read and write files from each node in the "HADOOP (2.5.2) YARN Application"? I usually know that YARN applications run on HDFS. I do not have a website or document that answers these basic questions, so I ask.

Upvotes: 0

Answers (3)

OneCricketeer

Reputation: 191953

YARN applications run (or at least request memory space) in YARN, not HDFS.

HDFS is only a filesystem for file storage. You read and write using many tools, such as the Hadoop CLI, MapReduce, Apache Spark, etc.

The CLI for example is hadoop fs -put /local-file hdfs://remote/file

Whether those applications are managed by YARN is unrelated to how HDFS files are accessed.

You do not need YARN for HDFS. It is entirely separate and used for a memory Resource Negotiator (it's in the name). Whichever node a YARN container is ran on could be an HDFS datanode within a Hadoop environment, but that's not always true - it's just good design as the data would be NODE_LOCAL in HDFS terms, so no data needs shuffled around the cluster.

Upvotes: 1

Remus Rusanu

Reputation: 294407

Read the HDFS Users Guide. There are numerous client libraries like libhdfs, use the FileSystem API from Jaav, you can use the WebHDFS Rest API, fork to shell and do commands. If your 'YARN application' is a M/R app then all this is already handled by M/R and you only need consume the input you're given.

Upvotes: 0

abhiieor

Reputation: 3554

In a map-reduce action java based or streaming, Spark etc.; all HDFS is accessible to YARN running program as native file storage. So like your local file storage just read the data stored using usual file read commands. For example in R streaming:

path1 <- paste0("hadoop fs -getmerge /apps/hive/warehouse/",hive_db,".db/dsp/mdse_dept_ref_i=",dept,"/mdse_clas_ref_i=",clas,
                " dspD",dept,"C",clas,".txt")
system(command = path1)
filename <- paste0("ItemSlsD",dept,"C",clas,"SC",sbcl,".txt")
item_sls <- data.table(read.table(filename,sep="\001"))

Here I am just reading HDFS folder using hadoop fs -getmerge and then pasting it in a file. this is followed by putting that file into data.table data structure in R. Same way you can issue HDFS command using subprocess package in python and store result in pandas.

Upvotes: 0

How can I access hdfs on hadoop-yarn application from each node?

Answers (3)

Related Questions