user6023611
user6023611

Reputation:

Spark, HiveContext, ThriftServer - Table persistence

I have configured data SparkStreaming. I would like to persist this data for a variety of goals:

Where is data kept in HiveContext? In memory? On the local disk? Is it provided by thriftServer?

Upvotes: 2

Views: 945

Answers (2)

user1314742
user1314742

Reputation: 2924

You could persist your DataFrames from spark to a hive table by doing: yourDataFrame.saveAsTable("YourTableName")

If you want to insert data to existing table you could use: yourDataFrame.writer().mode(SaveMode.Append).saveAsTable("YourTableName")

This will saves your DataFrame on a persistent Hive table. the location of this table will be dependent on the configuration in your hive-site.xml.

By default, if you are testing locally, the location will be on your local disk on the location /user/hive/warehouse/YourTableName

If you are using Spark with Hive on Yarn/HDFS then the table will be saved on HDFS on the location defined by the property hive.metastore.warehouse.dir in your hive-site.xml configuration file

Hope that will help :)

Upvotes: 1

Erica
Erica

Reputation: 9

You can choose to cache data on memory using

your_hive_context.cacheTable("table_name")

The Thrift Server access to a global-context that contains all the table, even the temporary ones.

If you cache the table Tableau will get the query results faster, but you have to keep running the Spark Batch application.

I did not find yet a way to update some of the data without opening a new HiveContext.

Upvotes: 0

Related Questions