Spark, HiveContext, ThriftServer - Table persistence

Question

I have configured data SparkStreaming. I would like to persist this data for a variety of goals:

exposing for Tableau (It requires thriftServer, while thriftServer requires hiveContext).
sometimes I would like to be able to update some data.

Where is data kept in HiveContext? In memory? On the local disk? Is it provided by thriftServer?

user1314742 · Accepted Answer

You could persist your DataFrames from spark to a hive table by doing: yourDataFrame.saveAsTable("YourTableName")

If you want to insert data to existing table you could use: yourDataFrame.writer().mode(SaveMode.Append).saveAsTable("YourTableName")

This will saves your DataFrame on a persistent Hive table. the location of this table will be dependent on the configuration in your hive-site.xml.

By default, if you are testing locally, the location will be on your local disk on the location /user/hive/warehouse/YourTableName

If you are using Spark with Hive on Yarn/HDFS then the table will be saved on HDFS on the location defined by the property hive.metastore.warehouse.dir in your hive-site.xml configuration file

Hope that will help :)

Spark, HiveContext, ThriftServer - Table persistence

Answers (2)

Related Questions