Reputation: 13
...
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
<description>location of default database for the warehouse</description>
</property>
...
the code is a part of /user/spark3/conf/hive-site.xml
At first the value was
hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7
And I changed the value
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
Below there is a code and result
println(spark.conf.get("spark.sql.warehouse.dir")) //--Default : spark-warehouse
spark
.sql("""
SELECT
website,
avg(age) avg_age,
max(id) max_id
FROM
people a
JOIN
projects b
ON a.name = b.manager
WHERE
a.age > 11
GROUP BY
b.website
""")
.write
.mode("overwrite") //--Overwrite mode....
.saveAsTable("JoinedPeople") //--saveAsTable(<warehouse_table_name>)....
sql("SELECT * FROM JoinedPeople").show(1000)
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
| website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...| 30.0| 2|
|http://kafka.apac...| 19.0| 3|
|http://storm.apac...| 30.0| 9|
+--------------------+-------+------+
value 'spark.sql.warehouse.dir' was changed kikang into skybluelee as I want.
but the location of table "JoinedPeople" doesn't change. The location is 'hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7' - the first value in hive-site.xml
I want to change the location of default database.
How can I change the default location?
I changed 'spark-defaults.conf' and of courese turn off & on the ubuntu. But not effective
Upvotes: 0
Views: 1034
Reputation: 13
I found what I missed!
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://spark-worker-01:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
this is a part of warehouse in mysql_5.7 when I tried first in metastore_db was created, so even I changed the location, that doesn't make change in metastore
Upvotes: 1
Reputation: 1493
May you check what your Spark version is in this scenario? According to Hive Tables in the official Spark documentation:
Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse. You may need to grant write privilege to the user who starts the Spark application.
Does changing the property in hive-site.xml
work for you (assuming the Spark version is above 2.0.0)?
...
<property>
<name>spark.sql.warehouse.dir</name>
<value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
<description>location of default database for the warehouse</description>
</property>
...
Does setting the property before the initialization of a Spark session work for you?
import org.apache.spark.sql.SparkSession
val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7"
// Create a SparkSession with the desired warehouse location
val spark = SparkSession
.builder()
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
// Import the necessary Spark functions and implicit
import spark.implicits._
import spark.sql
sql("""
SELECT
website,
avg(age) avg_age,
max(id) max_id
FROM
people a
JOIN
projects b
ON a.name = b.manager
WHERE
a.age > 11
GROUP BY
b.website
""")
.write
.mode("overwrite")
.saveAsTable("JoinedPeople")
// Retrieve the location of the "JoinedPeople" table from the Hive metastore
val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0)
println(s"Table location: $tableLocation")
Upvotes: 0