rura6502
rura6502

Reputation: 385

how does apache spark process the non-rdd likes System.out, for, while?

when i write the iterator(likes for for while) or non-rdd,

how really spark process non-rdd and how to sperate this?

lies this

public static void main(String[] args) {
    JavaSparkContext sc = ....;
    int sum=0;
    for(int i=0; 0<1000000; i++)
        sum+=i;
    sc.wrtieHadoop("/user/text/test.txt");
}

Upvotes: 0

Views: 51

Answers (1)

Thiago Baldim
Thiago Baldim

Reputation: 7742

This job is handle by the Driver, every block of code that is out of the framework Apache Spark will run in the Driver.

That is the reason that you need to understand how much memory you are going to use in your driver, because if you are going to do complex stuff or even if you are going to do a Collect for any reason that will be. All process will be done in the Driver. See the image below:

enter image description here

So everything that runs out Spark, as your code shows:

public static void main(String[] args) {
    JavaSparkContext sc = ....;
    int sum=0;
    for(int i=0; 0<1000000; i++)
        sum+=i;
    sc.wrtieHadoop("/user/text/test.txt");
}

Before the sc.writeHadoop this will run all in the driver. After it finish than the workers are going to be called.

Upvotes: 2

Related Questions