Reputation: 1042
I want to using Spark Streaming to retrieve data from Kafka. Now, I want to save my data in a remote HDFS. I know that I have to use the function saveAsText. However, I don't know precisely how to specify the path.
Is that correct if I write this:
myDStream.foreachRDD(frm->{
frm.saveAsTextFile("hdfs://ip_addr:9000//home/hadoop/datanode/myNewFolder");
});
where ip_addr
is the ip address of my hdfs remote server.
/home/hadoop/datanode/
is the DataNode HDFS directory created when I installed hadoop (I don't know if I have to specify this directory). And,
myNewFolder
is the folder where I want to save my data.
Thanks in advance.
Yassir
Upvotes: 4
Views: 16217
Reputation: 18270
The path has to be a directory in HDFS.
For example, if you want to save the files inside a folder named myNewFolder
under the root /
path in HDFS.
The path to use would be hdfs://namenode_ip:port/myNewFolder/
On execution of the spark job this directory myNewFolder
will be created.
The datanode data directory which is given for the dfs.datanode.data.dir
in hdfs-site.xml
is used to store the blocks of the files you store in HDFS, should not be referenced as HDFS directory path.
Upvotes: 7