Jithin K J
Jithin K J

Reputation: 21

Runtime Error: Cannot set database in spark! [DBT + Spark + Thrift]

Can anyone help me on this? I'm getting error,***Runtime Error: Cannot set database in spark!*** while running dbt model via Spark thrift mode with remote Hive metastore.

I need to transform some models in DBT using Apache Spark as the adapter. Now, I'm running spark locally on my local machine. I started the thrift server as below with remote hive metastore URI.

  1. Started master

./sbin/start-master.sh

  1. Started worker

./sbin/start-worker.sh spark://master_url:7077

  1. Started Thrift Server

./sbin/start-thriftserver.sh --master spark://master_url:7077 --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.1 --hiveconf hive.metastore.uris=thrift://ip:9083

In my DBT project,

project_name:
  outputs:
    dev:
      host: localhost
      method: thrift
      port: 10000
      schema: test_dbt
      threads: 4
      type: spark
      user: admin
  target: dev

While executing dbt run, getting the following error.

dbt run --select test -t dev
Running with dbt=1.1.0
Partial parse save file not found. Starting full parse.
Encountered an error:
Runtime Error 
Cannot set database in spark!

Please note that there is not much info in dbt.log


SOLUTION

This error was getting because of the " database" filed in the source yml file.

Always schema, never database Apache Spark uses the terms "schema" and "database" interchangeably. dbt understands database to exist at a higher level than schema. As such, you should never use or set database as a node config or in the target profile when running dbt-spark. https://docs.getdbt.com/reference/resource-configs/spark-configs#always-schema-never-database

Upvotes: 2

Views: 2010

Answers (1)

Ujjawal Mandhani
Ujjawal Mandhani

Reputation: 31

schema test_dbt not exist in the hive I think you need to create test_dbt database in Hive

step1. log in to spark cluster and stop thrift server and run spark-sql

step2. create database test_dbt

step3. restart thrift server

OR

you can use default schema like below


dbt_spark_project:
  outputs:
    dev:
      host: spark-cluster
      method: thrift
      port: 10000
      schema: default
      threads: 4
      type: spark
  target: dev

Upvotes: 0

Related Questions