Reputation:
I can use SparkSession to get the list of tables in Hive, or access a Hive table as shown in the code below. Now my question is if in this case, I'm using Spark with Hive Context?
Or is it that to use hive context in Spark, I must directly use HiveContext object to access tables, and perform other Hive related functions?
spark.catalog.listTables.show
val personnelTable = spark.catalog.getTable("personnel")
Upvotes: 3
Views: 3344
Reputation: 74
In spark-shell , we can also use spark.conf.getAll
. This command will return spark session configuration and we can see "spark.sql.catalogImplementation -> hive" suggesting Hive support.
Upvotes: 2
Reputation: 74639
I can use SparkSession to get the list of tables in Hive, or access a Hive table as shown in the code below.
Yes, you can!
Now my question is if in this case, I'm using Spark with Hive Context?
It depends on how you created the spark
value.
SparkSession
has the Builder
interface that comes with enableHiveSupport method.
enableHiveSupport(): Builder Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions.
If you used that method, you've got Hive support. If not, well, you don't have it.
You may think that spark.catalog
is somehow related to Hive. Well, it was meant to offer Hive support, but by default the catalog is in-memory
.
catalog: Catalog Interface through which the user may create, drop, alter or query underlying databases, tables, functions etc.
spark.catalog
is just an interface that Spark SQL comes with two implementations for - in-memory
(default) and hive
.
Now, you might be asking yourself this question:
Is there anyway, such as through spark.conf, to find out if the hive support has been enabled?
There's no isHiveEnabled
method or similar I know of that you could use to know whether you work with a Hive-aware SparkSession
or not (as a matter of fact you don't need this method since you're in charge of creating a SparkSession
instance so you should know what your Spark application does).
In environments where you're given a SparkSession
instance (e.g. spark-shell
or Databricks), the only way to check if a particular SparkSesssion
has the Hive support enabled would be to see the type of the catalog implementation.
scala> spark.sessionState.catalog
res1: org.apache.spark.sql.catalyst.catalog.SessionCatalog = org.apache.spark.sql.hive.HiveSessionCatalog@4aebd384
If you see HiveSessionCatalog
used, the SparkSession
instance is Hive-aware.
Upvotes: 6