Guy Cohen
Guy Cohen

Reputation: 266

How can I use "spark.catalog.createTable" function to create a partitioned table?

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.catalog.Catalog

There is an option parameter but I didn't find any sample that use it to pass the partitioned columns

Upvotes: 5

Views: 3845

Answers (1)

Mikita Harbacheuski
Mikita Harbacheuski

Reputation: 2253

I believe it's not needed to specify partition columns if you don't provide a schema. In that case spark infers schema and partitioning from the location automatically. However it's not possible to provide both schema and partitioning with the current implementation, but fortunately all the code from underlying implementation is open thus i finished with the next method for creating external Hive tables.

  private def createExternalTable(tableName: String, location: String, 
      schema: StructType, partitionCols: Seq[String], source: String): Unit = {
    val tableIdent = TableIdentifier(tableName)
    val storage = DataSource.buildStorageFormatFromOptions(Map("path" -> location))
    val tableDesc = CatalogTable(
      identifier = tableIdent,
      tableType = CatalogTableType.EXTERNAL,
      storage = storage,
      schema = schema,
      partitionColumnNames = partitionCols,
      provider = Some(source)
    )
    val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None)
    spark.sessionState.executePlan(plan).toRdd  
  }

Upvotes: 5

Related Questions