Reputation: 994
I'm trying to accomplish a simple things of "writing a dataframe to Hive table", below is the code written in Java. I'm using Cloudera VM with no changes.
public static void main(String[] args) {
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(JsonToHive.class.getName())
//.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/")
.enableHiveSupport().master(master).getOrCreate();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json");
rowDataset.printSchema();
rowDataset.registerTempTable("employeesData");
Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData");
firstRow.show();
sparkSession.catalog().listTables().select("*").show();
firstRow.write().mode() saveAsTable("default.employee");
sparkSession.close();
}
I have create the managed table in HIVE using the HQL ,
CREATE TABLE employee ( firstName STRING, lastName STRING, addresses ARRAY < STRUCT < street:STRING, city:STRING, state:STRING > > ) STORED AS PARQUET;
I'm reading a simple JSON file for data from "employees.json"
{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}
It says "Table default
.employee
already exists.;" and it does not append the content. How to append the content to the hive table ??
If I set the mode("append"), it does not complain but it does not write the content as well ..
firstRow.write().mode("append") saveAsTable("default.employee");
Any help will be appreciated... thanks.
+-------------+--------+-----------+---------+-----------+
| name|database|description|tableType|isTemporary|
+-------------+--------+-----------+---------+-----------+
| employee| default| null| MANAGED| false|
|employeesdata| null| null|TEMPORARY| true|
+-------------+--------+-----------+---------+-----------+
UPDATE
/usr/lib/hive/conf/hive-site.xml was not in the classpath so it was not reading the tables, after adding it in the classpath it worked fine ... Since I was running from IntelliJ I have this problem .. in production the spark-conf folder will have link to hive-site.xml ...
Upvotes: 0
Views: 3345
Reputation: 22477
Looks like you should be doing insertInto(String tableName) instead of saveAsTable(String tableName)
.
firstRow.write().mode("append").insertInto("default.employee");
Upvotes: 1