Yassir S
Yassir S

Reputation: 1042

how to save data in HDFS with spark?

I want to using Spark Streaming to retrieve data from Kafka. Now, I want to save my data in a remote HDFS. I know that I have to use the function saveAsText. However, I don't know precisely how to specify the path.

Is that correct if I write this:

myDStream.foreachRDD(frm->{
    frm.saveAsTextFile("hdfs://ip_addr:9000//home/hadoop/datanode/myNewFolder");
});

where ip_addr is the ip address of my hdfs remote server. /home/hadoop/datanode/ is the DataNode HDFS directory created when I installed hadoop (I don't know if I have to specify this directory). And, myNewFolder is the folder where I want to save my data.

Thanks in advance.

Yassir

Upvotes: 4

Views: 16217

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

The path has to be a directory in HDFS.

For example, if you want to save the files inside a folder named myNewFolder under the root / path in HDFS.

The path to use would be hdfs://namenode_ip:port/myNewFolder/

On execution of the spark job this directory myNewFolder will be created.

The datanode data directory which is given for the dfs.datanode.data.dir in hdfs-site.xml is used to store the blocks of the files you store in HDFS, should not be referenced as HDFS directory path.

Upvotes: 7

Related Questions