user4046073
user4046073

Reputation: 871

How to create Spark/Scala project in IntelliJ IDEA (fails to resolve dependencies in build.sbt)?

I'm trying to build and run a Scala/Spark project in IntelliJ IDEA.

I have added org.apache.spark:spark-sql_2.11:2.0.0 in global libraries and my build.sbt looks like below.

name := "test"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"

I still get an error that says

unknown artifact. unable to resolve or indexed

under spark-sql.

When tried to build the project the error was

Error:(19, 26) not found: type sqlContext, val sqlContext = new sqlContext(sc)

I have no idea what the problem could be. How to create a Spark/Scala project in IntelliJ IDEA?

Update: Following the suggestions I updated the code to use Spark Session, but it still unable to read a csv file. What am I doing wrong here? Thank you!

 val spark = SparkSession
.builder()
.appName("Spark example")
.config("spark.some.config.option", "some value")
.getOrCreate()

import spark.implicits._

val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show()  //it doesn't show anything 
//pdf.select("DATE_KEY").show()

Upvotes: 2

Views: 4184

Answers (2)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

sql should upper case letters as below

val sqlContext = new SQLContext(sc)

SQLContext is deprecated for newer versions of spark so I would suggest you to use SparkSession

val spark = SparkSession.builder().appName("testings").getOrCreate 
val sqlContext = spark.sqlContext

If you want to set the master through your code instead of from spark-submit command then you can set .master as well (you can set configs too)

val spark = SparkSession.builder().appName("testings").master("local").config("configuration key", "configuration value").getOrCreate 
val sqlContext = spark.sqlContext

Update

Looking at your sample data

DATE|PID|TYPE
8/03/2017|10199786|O

and testing your code

val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show()

I had output as

+--------------------+
|                 _c0|
+--------------------+
|       DATE|PID|TYPE|
|8/03/2017|10199786|O|
+--------------------+

Now adding .option for delimiter and header as

val testdf2 = spark.read.option("delimiter", "|").option("header", true).csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf2.show()

Output was

+---------+--------+----+
|     DATE|     PID|TYPE|
+---------+--------+----+
|8/03/2017|10199786|   O|
+---------+--------+----+

Note: I have used .master("local") for SparkSession object

Upvotes: 1

Jacek Laskowski
Jacek Laskowski

Reputation: 74629

(That should really be part of the Spark official documentation)

Replace the following from your configuration in build.sbt:

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"

with the following:

// the latest Scala version that is compatible with Spark
scalaVersion := "2.11.11"

// Few changes here
// 1. Use double %% so you don't have to worry about Scala version
// 2. I doubt you need spark-core dependency
// 3. Use the latest Spark version
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"

Don't worry about IntelliJ IDEA telling you the following:

unknown artifact. unable to resolve or indexed

It's just something you have to live with and the only solution I could find is to...accept the annoyance.

val sqlContext = new sqlContext(sc)

The real type is SQLContext, but as the scaladoc says:

As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility.

Please use SparkSession instead.

The entry point to programming Spark with the Dataset and DataFrame API.

See the Spark official documentation to read on SparkSession and other goodies. Start from Getting Started. Have fun!

Upvotes: 1

Related Questions