M80
M80

Reputation: 994

In Apache Spark Writing a Dataframe to Hive table in Java

I'm trying to accomplish a simple things of "writing a dataframe to Hive table", below is the code written in Java. I'm using Cloudera VM with no changes.

 public static void main(String[] args) {
    String master = "local[*]";

    SparkSession sparkSession = SparkSession
            .builder().appName(JsonToHive.class.getName())
            //.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/")
            .enableHiveSupport().master(master).getOrCreate();

    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");

    SQLContext sqlCtx = sparkSession.sqlContext();
    Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json");
    rowDataset.printSchema();
    rowDataset.registerTempTable("employeesData");

    Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData");
    firstRow.show();

    sparkSession.catalog().listTables().select("*").show();

    firstRow.write().mode() saveAsTable("default.employee");
    sparkSession.close();

}

I have create the managed table in HIVE using the HQL ,

 CREATE TABLE employee ( firstName STRING, lastName STRING, addresses  ARRAY < STRUCT < street:STRING,  city:STRING, state:STRING > > )  STORED AS PARQUET;

I'm reading a simple JSON file for data from "employees.json"

{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}

It says "Table default.employee already exists.;" and it does not append the content. How to append the content to the hive table ??

If I set the mode("append"), it does not complain but it does not write the content as well ..

firstRow.write().mode("append") saveAsTable("default.employee");

Any help will be appreciated... thanks.

+-------------+--------+-----------+---------+-----------+
|         name|database|description|tableType|isTemporary|
+-------------+--------+-----------+---------+-----------+
|     employee| default|       null|  MANAGED|      false|
|employeesdata|    null|       null|TEMPORARY|       true|
+-------------+--------+-----------+---------+-----------+

UPDATE

/usr/lib/hive/conf/hive-site.xml was not in the classpath so it was not reading the tables, after adding it in the classpath it worked fine ... Since I was running from IntelliJ I have this problem .. in production the spark-conf folder will have link to hive-site.xml ...

Upvotes: 0

Views: 3345

Answers (1)

Jack Leow
Jack Leow

Reputation: 22477

Looks like you should be doing insertInto(String tableName) instead of saveAsTable(String tableName).

firstRow.write().mode("append").insertInto("default.employee");

Upvotes: 1

Related Questions