Reputation: 2091
I have set a new standalone Apache Spark server on a freshly installed Ubuntu server. I try to send my first job there and I am not very successful.
Here is what I do locally:
SparkConf conf = new SparkConf().setAppName("myFirstJob").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
javaSparkContext.setLogLevel("WARN");
SQLContext sqlContext = new SQLContext(javaSparkContext);
System.out.println("Hello, Remote Spark v." + javaSparkContext.version());
DataFrame df;
df = sqlContext.read().option("dateFormat", "yyyy-mm-dd")
.json("./src/main/resources/north-carolina-school-performance-data.json");
df = df.withColumn("district", df.col("fields.district"));
df = df.groupBy("district").count().orderBy(df.col("district"));
df.show(150);
It works: it displays the name of the school districts in NC with the number of schools in the district:
Hello, Remote Spark v.1.6.1
+--------------------+-----+
| district|count|
+--------------------+-----+
|Alamance-Burlingt...| 34|
|Alexander County ...| 10|
|Alleghany County ...| 4|
|Anson County Schools| 10|
| Ashe County Schools| 5|
|Asheboro City Sch...| 8|
...
Now, if I change the first line to:
SparkConf conf = new SparkConf().setAppName("myFirstJob").setMaster("spark://10.0.100.120:7077");
It goes well to:
Hello, Remote Spark v.1.6.1
16/07/12 10:58:34 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/12 10:58:49 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
The first thing which is strange (for me) is that the server has Spark 1.6.2. I was kind of expecting to see 1.6.2 as the version number.
Then on the UI, I go there and see:
If I click on app-20160712105816-0011, I get:
Any click on other link will bring me to my local Apache Spark instance.
After clicking around, I can see something like:
And if I look at the log on the server I see:
16/07/12 10:37:00 INFO Master: Registered app myFirstJob with ID app-20160712103700-0009
16/07/12 10:37:03 INFO Master: Received unregister request from application app-20160712103700-0009
16/07/12 10:37:03 INFO Master: Removing app app-20160712103700-0009
16/07/12 10:37:03 INFO Master: 10.0.100.100:54396 got disassociated, removing it.
16/07/12 10:37:03 INFO Master: 10.0.100.100:54392 got disassociated, removing it.
16/07/12 10:50:44 INFO Master: Registering app myFirstJob
16/07/12 10:50:44 INFO Master: Registered app myFirstJob with ID app-20160712105044-0010
16/07/12 10:51:20 INFO Master: Received unregister request from application app-20160712105044-0010
16/07/12 10:51:20 INFO Master: Removing app app-20160712105044-0010
16/07/12 10:51:20 INFO Master: 10.0.100.100:54682 got disassociated, removing it.
16/07/12 10:51:20 INFO Master: 10.0.100.100:54680 got disassociated, removing it.
16/07/12 10:58:16 INFO Master: Registering app myFirstJob
16/07/12 10:58:16 INFO Master: Registered app myFirstJob with ID app-20160712105816-0011
Which all seems ok to me...
I had a previous question (unresolved) at Apache Spark Server installation requires Hadoop? Not automatically installed?, with the same environment, but this is a completely different - much smaller - app.
Any clue on what is going on?
Upvotes: 0
Views: 567
Reputation: 11383
According to the Web UI screenshot, your server has no worker(slave). Spark provides a few script to start the cluster.
conf/slaves
conf/slaves
If your cluster is configured correctly, then you only need to call start-all.sh
on the master machine to start everything.
Upvotes: 1