Reputation: 21
Can anyone help me on this?
I'm getting error,***Runtime Error: Cannot set database in spark!***
while running dbt model via Spark thrift mode with remote Hive metastore.
I need to transform some models in DBT using Apache Spark as the adapter. Now, I'm running spark locally on my local machine. I started the thrift server as below with remote hive metastore URI.
./sbin/start-master.sh
./sbin/start-worker.sh spark://master_url:7077
./sbin/start-thriftserver.sh --master spark://master_url:7077 --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.1 --hiveconf hive.metastore.uris=thrift://ip:9083
In my DBT project,
project_name: outputs: dev: host: localhost method: thrift port: 10000 schema: test_dbt threads: 4 type: spark user: admin target: dev
While executing dbt run, getting the following error.
dbt run --select test -t dev
Running with dbt=1.1.0
Partial parse save file not found. Starting full parse.
Encountered an error:
Runtime Error
Cannot set database in spark!
Please note that there is not much info in dbt.log
SOLUTION
This error was getting because of the " database" filed in the source yml file.
Always schema, never database Apache Spark uses the terms "schema" and "database" interchangeably. dbt understands database to exist at a higher level than schema. As such, you should never use or set database as a node config or in the target profile when running dbt-spark. https://docs.getdbt.com/reference/resource-configs/spark-configs#always-schema-never-database
Upvotes: 2
Views: 2010
Reputation: 31
schema test_dbt not exist in the hive I think you need to create test_dbt database in Hive
step1. log in to spark cluster and stop thrift server and run spark-sql
step2. create database test_dbt
step3. restart thrift server
OR
you can use default schema like below
dbt_spark_project:
outputs:
dev:
host: spark-cluster
method: thrift
port: 10000
schema: default
threads: 4
type: spark
target: dev
Upvotes: 0