user2295633
user2295633

Reputation:

Does SparkSession always use Hive Context?

I can use SparkSession to get the list of tables in Hive, or access a Hive table as shown in the code below. Now my question is if in this case, I'm using Spark with Hive Context?

Or is it that to use hive context in Spark, I must directly use HiveContext object to access tables, and perform other Hive related functions?

spark.catalog.listTables.show
val personnelTable = spark.catalog.getTable("personnel")

Upvotes: 3

Views: 3344

Answers (2)

Rajendra Pallala
Rajendra Pallala

Reputation: 74

In spark-shell , we can also use spark.conf.getAll. This command will return spark session configuration and we can see "spark.sql.catalogImplementation -> hive" suggesting Hive support.

Upvotes: 2

Jacek Laskowski
Jacek Laskowski

Reputation: 74639

I can use SparkSession to get the list of tables in Hive, or access a Hive table as shown in the code below.

Yes, you can!

Now my question is if in this case, I'm using Spark with Hive Context?

It depends on how you created the spark value.

SparkSession has the Builder interface that comes with enableHiveSupport method.

enableHiveSupport(): Builder Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions.

If you used that method, you've got Hive support. If not, well, you don't have it.

You may think that spark.catalog is somehow related to Hive. Well, it was meant to offer Hive support, but by default the catalog is in-memory.

catalog: Catalog Interface through which the user may create, drop, alter or query underlying databases, tables, functions etc.

spark.catalog is just an interface that Spark SQL comes with two implementations for - in-memory (default) and hive.

Now, you might be asking yourself this question:

Is there anyway, such as through spark.conf, to find out if the hive support has been enabled?

There's no isHiveEnabled method or similar I know of that you could use to know whether you work with a Hive-aware SparkSession or not (as a matter of fact you don't need this method since you're in charge of creating a SparkSession instance so you should know what your Spark application does).

In environments where you're given a SparkSession instance (e.g. spark-shell or Databricks), the only way to check if a particular SparkSesssion has the Hive support enabled would be to see the type of the catalog implementation.

scala> spark.sessionState.catalog
res1: org.apache.spark.sql.catalyst.catalog.SessionCatalog = org.apache.spark.sql.hive.HiveSessionCatalog@4aebd384

If you see HiveSessionCatalog used, the SparkSession instance is Hive-aware.

Upvotes: 6

Related Questions