Amine Mokrani
Amine Mokrani

Reputation: 11

How can i show hive table using pyspark

Hello i created a spark HD insight cluster on azure and i’m trying to read hive tables with pyspark but the proble that its show me only default database

Anyone have an idea ?

Upvotes: 1

Views: 1458

Answers (3)

Renato Aguiar
Renato Aguiar

Reputation: 91

If you are using HDInsight 4.0, Spark and Hive not share metadata anymore.

For default you will not see hive tables from pyspark, is a problem that i share on this post: How save/update table in hive, to be readbale on spark.

But, anyway, things you can try:

  1. If you want test only on head node, you can change the hive-site.xml, on property "metastore.catalog.default", change the value to hive, after that open pyspark from command line.
  2. If you want to apply to all cluster nodes, changes need to be made on Ambari.
    • Login as admin on ambari
    • Go to spark2 > Configs > hive-site-override
    • Again, update property "metastore.catalog.default", to hive value
    • Restart all required on Ambari panel

These changes define hive metastore catalog as default. You can see hive databases and table now, but depending of table structure, you will not see the table data properly.

Upvotes: 1

Yukeshkumar
Yukeshkumar

Reputation: 534

You are missing details of hive server in SparkSession. If you haven't added any it will create and use default database to run sparksql.

If you've added configuration details in spark default conf file for spark.sql.warehouse.dir and spark.hadoop.hive.metastore.uris then while creating SparkSession add enableHiveSupport().

Else add configuration details while creating sparksession

.config("spark.sql.warehouse.dir","/user/hive/warehouse")
.config("hive.metastore.uris","thrift://localhost:9083")
.enableHiveSupport()

Upvotes: 0

过过招
过过招

Reputation: 4234

If you have created tables in other databases, try show tables from database_name. Replace database_name with the actual name.

Upvotes: 0

Related Questions