Reputation: 65
When Initialising Spark in Command-line interface by default SparkContext is initialised as sc and SQLContext as sqlContext.
But I need HiveContext as I am using a function collect_list
which is not supported by SparkContext, but is supported by HiveContext. Since HiveContext is a superclass of SparkContext ,it should have worked,but it isn't.
HOW DO I INITIALISE HiveContext in Scala using Spark CLI?
Upvotes: 1
Views: 4038
Reputation: 534
You can do so by following the below steps:
import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val depts = sqlContext.sql("select * from departments")`
Upvotes: 3
Reputation: 40370
In spark-shell, sqlContext is an instance of HiveContext by default. You can read about that in my previous answer here.
Nevertheless, collect_list
isn't available in spark 1.5.2. It was introduced in spark 1.6 so it's normal that you can find it.
Also you don't need to import org.apache.spark.sql.functions._
in the shell. It's imported by default.
Upvotes: 2
Reputation: 4925
The sqlContext is a HiveContext
scala> sqlContext
res11: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@4756c8f3
[Edit]
Import the functions before use it.
import org.apache.spark.sql.functions._
Upvotes: 1