Amom Mendes
Amom Mendes

Reputation: 1

How to read tables from a location and write data to a table of other cluster

I read table statistics from a metastore starting spark application setting up hive.metastore.uris. However I need write data to another hive.

I've tryed to clean Active and Default Session, build another session with the new metastore uri, but spark continues trying write to the table of the first hive.

val spark = SparkSession.builder()
          .appName(appName)
          .enableHiveSupport()
          .config("hive.metastore.uris", FIRST_METASTORE)
          .config("spark.sql.hive.convertMetastoreOrc", "false")
          .config("spark.sql.caseSensitive", "false")
          .config("hive.exec.dynamic.partition", "true")
          .config("hive.exec.dynamic.partition.mode", "nonstrict")
          .getOrCreate()

val df = spark.sql("DESCRIBE FORMATTED source_table")

SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

val spark2 = SparkSession.builder()
          .appName(appName)
          .enableHiveSupport()
          .config("hive.metastore.uris", NEW_MESTASTORE)
          .config("spark.sql.hive.convertMetastoreOrc", "false")
          .config("spark.sql.caseSensitive", "false")
          .config("hive.exec.dynamic.partition", "true")
          .config("hive.exec.dynamic.partition.mode", "nonstrict")
          .getOrCreate()

SparkSession.setDefaultSession(sparkSession2)
SparkSession.setActiveSession(sparkSession2)

df.write
      .format("parquet")
      .mode(SaveMode.Overwrite)
      .insertInto("other_cluster_table")
  }

As I said, it would be expected that dataframe should be wrote to the table location of the new metastore and catalog, but it doesn't. This happens because interface DataFrameWriter get information from df.sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) in order to insert into some existent table, but how could I deal with it?

Upvotes: 0

Views: 719

Answers (1)

Amom Mendes
Amom Mendes

Reputation: 1

After reading about multiple sparkContexts, I solve this question just writing the parquet directly to namenode/directory/to/partition/ and then adding partition to table using beeline.

Upvotes: 0

Related Questions