user5768866
user5768866

Reputation:

HiveContext vs spark sql

I am trying to compare spark sql vs hive context, may I know any difference, is the hivecontext sql use the hive query, while spark sql use the spark query?

Below is my code:

sc = pyspark.SparkContext(conf=conf).getOrCreate()
sqlContext = HiveContext(sc)
sqlContext.sql ('select * from table')

While sparksql:

spark.sql('select * from table')

May I know the difference of this two?

Upvotes: 2

Views: 5740

Answers (1)

Lakshman Battini
Lakshman Battini

Reputation: 1912

SparkSession provides a single point of entry to interact with underlying Spark functionality and allows programming Spark with DataFrame and Dataset APIs. Most importantly, it curbs the number of concepts and constructs a developer has to juggle while interacting with Spark.

SparkSession, without explicitly creating SparkConf, SparkContext or SQLContext, encapsulates them within itself.

SparkSession has merged SQLContext and HiveContext in one object from Spark 2.0+.

When building a session object, for example:

val spark = SparkSession .builder() .appName("SparkSessionExample").config( "spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()

.enableHiveSupport() provides HiveContext functions. so you will be able to access Hive tables since spark session is initialized with HiveSupport.

So, there is no difference between "sqlContext.sql" and "spark.sql", but it is advised to use "spark.sql", since spark is single point of entry for all the Spark API's.

Upvotes: 7

Related Questions