Reputation: 871
I'm trying to build and run a Scala/Spark project in IntelliJ IDEA.
I have added org.apache.spark:spark-sql_2.11:2.0.0
in global libraries and my build.sbt
looks like below.
name := "test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"
I still get an error that says
unknown artifact. unable to resolve or indexed
under spark-sql
.
When tried to build the project the error was
Error:(19, 26) not found: type sqlContext, val sqlContext = new sqlContext(sc)
I have no idea what the problem could be. How to create a Spark/Scala project in IntelliJ IDEA?
Update:
Following the suggestions I updated the code to use Spark Session
, but it still unable to read a csv file. What am I doing wrong here? Thank you!
val spark = SparkSession
.builder()
.appName("Spark example")
.config("spark.some.config.option", "some value")
.getOrCreate()
import spark.implicits._
val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show() //it doesn't show anything
//pdf.select("DATE_KEY").show()
Upvotes: 2
Views: 4184
Reputation: 41957
sql should upper case letters as below
val sqlContext = new SQLContext(sc)
SQLContext
is deprecated for newer versions of spark so I would suggest you to use SparkSession
val spark = SparkSession.builder().appName("testings").getOrCreate
val sqlContext = spark.sqlContext
If you want to set the master
through your code instead of from spark-submit
command then you can set .master
as well (you can set configs
too)
val spark = SparkSession.builder().appName("testings").master("local").config("configuration key", "configuration value").getOrCreate
val sqlContext = spark.sqlContext
Update
Looking at your sample data
DATE|PID|TYPE
8/03/2017|10199786|O
and testing your code
val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show()
I had output as
+--------------------+
| _c0|
+--------------------+
| DATE|PID|TYPE|
|8/03/2017|10199786|O|
+--------------------+
Now adding .option
for delimiter
and header
as
val testdf2 = spark.read.option("delimiter", "|").option("header", true).csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf2.show()
Output was
+---------+--------+----+
| DATE| PID|TYPE|
+---------+--------+----+
|8/03/2017|10199786| O|
+---------+--------+----+
Note: I have used .master("local")
for SparkSession
object
Upvotes: 1
Reputation: 74629
(That should really be part of the Spark official documentation)
Replace the following from your configuration in build.sbt
:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"
with the following:
// the latest Scala version that is compatible with Spark
scalaVersion := "2.11.11"
// Few changes here
// 1. Use double %% so you don't have to worry about Scala version
// 2. I doubt you need spark-core dependency
// 3. Use the latest Spark version
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
Don't worry about IntelliJ IDEA telling you the following:
unknown artifact. unable to resolve or indexed
It's just something you have to live with and the only solution I could find is to...accept the annoyance.
val sqlContext = new sqlContext(sc)
The real type is SQLContext, but as the scaladoc says:
As of Spark 2.0, this is replaced by
SparkSession
. However, we are keeping the class here for backward compatibility.
Please use SparkSession instead.
The entry point to programming Spark with the Dataset and DataFrame API.
See the Spark official documentation to read on SparkSession and other goodies. Start from Getting Started. Have fun!
Upvotes: 1