Reputation: 721
Imagine I have two slaves and one master. Previously I have copied and pasted same data in all slave nodes.
JavaPairRDD<IntWritable, VectorWritable> seqVectors = sc.sequenceFile(inputPath, IntWritable.class,
VectorWritable.class);
Here inputpath is not an HDFS path, rather a local path that each slave node has access to. But now I am considering a situation where each slave has partial data, and I want to use same code, without installing/working with HDFS. But problem is after running the same code, program runs without any error but does not produce any result. Because
My question is how can I run my program, without any third party program, in this new situation?
Upvotes: 0
Views: 35
Reputation: 35249
You cannot. If you want to run Spark
without installing/working with HDFS
(or other distributed storage), you have to provide a full copy of data on each node, including driver. Obviously it is not something very useful in practice.
Upvotes: 1