Victor
Victor

Reputation: 17097

Confused by SparkContext import statements

I am trying to learn Apache Spark and cannot wrap my head wround this:

import spark.SparkContext
import SparkContext._

Why do we need the second line that almost looks like the first? And what does the '._' man after SparkContext?

Upvotes: 0

Views: 97

Answers (1)

Ged
Ged

Reputation: 18033

You do not need to execute the 2nd line import SparkContext._ . Given the old approach of, say, Spark 1.6.x for a self-contained Spark App, the following from https://github.com/mk6502/spark-1.6-scala-boilerplate/blob/master/src/main/scala/HelloSpark.scala clearly and briefly demonstrates this:

import org.apache.spark.{SparkContext, SparkConf}

object HelloSpark {
  def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("hello spark").setMaster("local"))

    val rdd = sc.parallelize(Array(1, 2, 3, 4, 5))

    println("count: ")
    println(rdd.count())

    sc.stop()
  }
}

In notebooks the settings and configs and entry points are automatic.

As stated in my comment, move on to Spark 2.x, 3.x and look at SparkSession via https://data-flair.training/forums/topic/sparksession-vs-sparkcontext-in-apache-spark/

In the 1.6 Spark Guide on Self-Contained Applications we see the 2nd line indeed, but no reference to underlying classes explicitly. E.g.

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Upvotes: 1

Related Questions