Reputation: 817
I am trying to convert a dataframe to hive table in spark Scala. I have read in a dataframe from an XML file. It uses SQL context to do so. I want to convert save this dataframe as a hive table. I am getting this error:
"WARN HiveContext$$anon$1: Could not persist
database_1
.test_table
in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format."
object spark_conversion {
def main(args: Array[String]): Unit = {
if (args.length < 2) {
System.err.println("Usage: <input file> <output dir>")
System.exit(1)
}
val in_path = args(0)
val out_path_csv = args(1)
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("conversion")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
val df = hiveContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "PolicyPeriod")
.option("attributePrefix", "attr_")
.load(in_path)
df.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save(out_path_csv)
df.saveAsTable("database_1.test_table")
df.printSchema()
df.show()
Upvotes: 3
Views: 7630
Reputation: 817
saveAsTable in spark is not compatible with hive. I am on CDH 5.5.2. Workaround from cloudera website:
df.registerTempTable(tempName)
hsc.sql(s"""
CREATE TABLE $tableName (
// field definitions )
STORED AS $format """)
hsc.sql(s"INSERT INTO TABLE $tableName SELECT * FROM $tempName")
http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html
Upvotes: 4