Reputation: 170
How do I access HDFS and read and write files from each node in the "HADOOP (2.5.2) YARN Application"? I usually know that YARN applications run on HDFS. I do not have a website or document that answers these basic questions, so I ask.
Upvotes: 0
Views: 1551
Reputation: 191953
YARN applications run (or at least request memory space) in YARN, not HDFS.
HDFS is only a filesystem for file storage. You read and write using many tools, such as the Hadoop CLI, MapReduce, Apache Spark, etc.
The CLI for example is hadoop fs -put /local-file hdfs://remote/file
Whether those applications are managed by YARN is unrelated to how HDFS files are accessed.
You do not need YARN for HDFS. It is entirely separate and used for a memory Resource Negotiator (it's in the name). Whichever node a YARN container is ran on could be an HDFS datanode within a Hadoop environment, but that's not always true - it's just good design as the data would be NODE_LOCAL
in HDFS terms, so no data needs shuffled around the cluster.
Upvotes: 1
Reputation: 294407
Read the HDFS Users Guide. There are numerous client libraries like libhdfs, use the FileSystem API from Jaav, you can use the WebHDFS Rest API, fork to shell and do commands. If your 'YARN application' is a M/R app then all this is already handled by M/R and you only need consume the input you're given.
Upvotes: 0
Reputation: 3554
In a map-reduce action java based or streaming, Spark etc.; all HDFS is accessible to YARN running program as native file storage. So like your local file storage just read the data stored using usual file read commands. For example in R streaming:
path1 <- paste0("hadoop fs -getmerge /apps/hive/warehouse/",hive_db,".db/dsp/mdse_dept_ref_i=",dept,"/mdse_clas_ref_i=",clas,
" dspD",dept,"C",clas,".txt")
system(command = path1)
filename <- paste0("ItemSlsD",dept,"C",clas,"SC",sbcl,".txt")
item_sls <- data.table(read.table(filename,sep="\001"))
Here I am just reading HDFS folder using hadoop fs -getmerge
and then pasting it in a file. this is followed by putting that file into data.table
data structure in R. Same way you can issue HDFS command using subprocess
package in python and store result in pandas
.
Upvotes: 0