Reputation: 18098
If we have a file of 128MB with an HDFS split of 128MB and we issue sc.textFile(xxx,4), what actually happens? What does the RDD in fact mean in this case in terms of partitioning? 4 processing partitions still or just 1?
Upvotes: 0
Views: 117
Reputation: 711
When you use a code like this:
JavaRDD<String> in = sc.textFile(xxx,4);
in.persist();
Then your RDD has 4 Partitions. They should have a size of 32 MB each. Then you can do something likes this:
rdd.count()
When you run then your code locally with local[4], then the count will be executed with 4 processes (tasks) in parallel.
Upvotes: 1