Reputation: 512
I have a Spark Thrift Server. I connect to the Thrift Server and get data of Hive table. If I query the same table again, it will again load the file in memory and execute the query.
Is there any way I can cache the table data using Spark Thrift Server? If yes, please let me know how to do it
Upvotes: 3
Views: 1546
Reputation: 22711
Pay attention that memory could be consumed by the Driver, not the executor (depend on your settings, local/cluster ...), so don't forget to allocate more memory to your driver.
To put in data:
CACHE TABLE today AS
SELECT * FROM datahub WHERE year=2017 AND fullname IN ("api.search.search") LIMIT 40000
Start by limiting the data, then look how memory is consumed to avoid OOM exception.
Upvotes: 0
Reputation: 16096
Two things:
CACHE LAZY TABLE
as in this answer: Spark SQL: how to cache sql query result without using rdd.cache() and cache tables in apache spark sqlspark.sql.hive.thriftServer.singleSession=true
so that other clients can use this cached table.Remember that caching is lazy, so it will be cached during first computation
Upvotes: 2