How to read tables from a location and write data to a table of other cluster

Question

I read table statistics from a metastore starting spark application setting up hive.metastore.uris. However I need write data to another hive.

I've tryed to clean Active and Default Session, build another session with the new metastore uri, but spark continues trying write to the table of the first hive.

val spark = SparkSession.builder()
          .appName(appName)
          .enableHiveSupport()
          .config("hive.metastore.uris", FIRST_METASTORE)
          .config("spark.sql.hive.convertMetastoreOrc", "false")
          .config("spark.sql.caseSensitive", "false")
          .config("hive.exec.dynamic.partition", "true")
          .config("hive.exec.dynamic.partition.mode", "nonstrict")
          .getOrCreate()

val df = spark.sql("DESCRIBE FORMATTED source_table")

SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

val spark2 = SparkSession.builder()
          .appName(appName)
          .enableHiveSupport()
          .config("hive.metastore.uris", NEW_MESTASTORE)
          .config("spark.sql.hive.convertMetastoreOrc", "false")
          .config("spark.sql.caseSensitive", "false")
          .config("hive.exec.dynamic.partition", "true")
          .config("hive.exec.dynamic.partition.mode", "nonstrict")
          .getOrCreate()

SparkSession.setDefaultSession(sparkSession2)
SparkSession.setActiveSession(sparkSession2)

df.write
      .format("parquet")
      .mode(SaveMode.Overwrite)
      .insertInto("other_cluster_table")
  }

As I said, it would be expected that dataframe should be wrote to the table location of the new metastore and catalog, but it doesn't. This happens because interface DataFrameWriter get information from df.sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) in order to insert into some existent table, but how could I deal with it?

How to read tables from a location and write data to a table of other cluster

Answers (1)

Related Questions