Reputation: 6475
I'd like to know whether it is possible to access the HDFS from the driver in a Spark application. That means, how to read/write a file from/to HDFS in the driver program. One possible solution is to read a file as a RDD (sc.textFile
) and then collect it in the driver. However, this is not I'm looking for.
Upvotes: 1
Views: 1988
Reputation: 3055
If you want to access directly HDFS from the driver you can simply do (in Scala):
val hdfs = FileSystem.get(sc.hadoopConfiguration)
Then you can use the so created variable hdfs
to access directly HDFS as a file system without using Spark.
(In the code snapshot I assumed you have a SparkContext
called sc
properly configured)
Upvotes: 4
Reputation: 3692
Simply collect all data at driver with collect action and use java api of hdfs to write it on hdfs.
Upvotes: -3