Reputation: 266
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.catalog.Catalog
There is an option parameter but I didn't find any sample that use it to pass the partitioned columns
Upvotes: 5
Views: 3845
Reputation: 2253
I believe it's not needed to specify partition columns if you don't provide a schema. In that case spark infers schema and partitioning from the location automatically. However it's not possible to provide both schema and partitioning with the current implementation, but fortunately all the code from underlying implementation is open thus i finished with the next method for creating external Hive tables.
private def createExternalTable(tableName: String, location: String,
schema: StructType, partitionCols: Seq[String], source: String): Unit = {
val tableIdent = TableIdentifier(tableName)
val storage = DataSource.buildStorageFormatFromOptions(Map("path" -> location))
val tableDesc = CatalogTable(
identifier = tableIdent,
tableType = CatalogTableType.EXTERNAL,
storage = storage,
schema = schema,
partitionColumnNames = partitionCols,
provider = Some(source)
)
val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None)
spark.sessionState.executePlan(plan).toRdd
}
Upvotes: 5