Reputation: 13

How can I change location of default database for the warehouse?(spark)

    ...
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
        <description>location of default database for the warehouse</description>
    </property>
    ...

the code is a part of /user/spark3/conf/hive-site.xml

At first the value was

hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7

And I changed the value

hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7

Below there is a code and result

println(spark.conf.get("spark.sql.warehouse.dir"))  //--Default : spark-warehouse

spark
    .sql("""
        SELECT 
            website, 
            avg(age) avg_age, 
            max(id) max_id
        FROM 
            people a 
            JOIN 
            projects b 
            ON a.name = b.manager 
        WHERE 
            a.age > 11 
        GROUP BY 
            b.website
        """)
    .write
    .mode("overwrite")  //--Overwrite mode....
    .saveAsTable("JoinedPeople")  //--saveAsTable(<warehouse_table_name>)....
    
sql("SELECT * FROM JoinedPeople").show(1000)

hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
|             website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...|   30.0|     2|
|http://kafka.apac...|   19.0|     3|
|http://storm.apac...|   30.0|     9|
+--------------------+-------+------+

value 'spark.sql.warehouse.dir' was changed kikang into skybluelee as I want.

but the location of table "JoinedPeople" doesn't change. The location is 'hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7' - the first value in hive-site.xml

I want to change the location of default database.

How can I change the default location?

I changed 'spark-defaults.conf' and of courese turn off & on the ubuntu. But not effective

Upvotes: 0

Answers (2)

skybluelee

Reputation: 13

I found what I missed!

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://spark-worker-01:3306/metastore_db?createDatabaseIfNotExist=true</value>
  <description>metadata is stored in a MySQL server</description>
</property>

this is a part of warehouse in mysql_5.7 when I tried first in metastore_db was created, so even I changed the location, that doesn't make change in metastore

Upvotes: 1

Scott Hsieh

Reputation: 1493

May you check what your Spark version is in this scenario? According to Hive Tables in the official Spark documentation:

Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse. You may need to grant write privilege to the user who starts the Spark application.

Does changing the property in hive-site.xml work for you (assuming the Spark version is above 2.0.0)?

 ...
 <property>
     <name>spark.sql.warehouse.dir</name>
     <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
     <description>location of default database for the warehouse</description>
 </property>
 ...

Does setting the property before the initialization of a Spark session work for you?

import org.apache.spark.sql.SparkSession

val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7"

// Create a SparkSession with the desired warehouse location
val spark = SparkSession
  .builder()
  .appName("Spark Hive Example")
  .config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()

// Import the necessary Spark functions and implicit
import spark.implicits._
import spark.sql
sql("""
 SELECT 
     website, 
     avg(age) avg_age, 
     max(id) max_id
 FROM 
     people a 
     JOIN 
     projects b 
     ON a.name = b.manager 
 WHERE 
     a.age > 11 
 GROUP BY 
     b.website
 """)
   .write
   .mode("overwrite")
   .saveAsTable("JoinedPeople")

// Retrieve the location of the "JoinedPeople" table from the Hive metastore
val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0)
println(s"Table location: $tableLocation")

Upvotes: 0

How can I change location of default database for the warehouse?(spark)

Answers (2)

Related Questions