Reputation: 993

Can we use cached RDD across batches on an executor

I have a case where I want to download some data from a remote store every one hour and store that as Key-Value pairs in a RDD on an executor/worker. I want to cache this RDD so that all future jobs/tasks/batches running on this executor/worker can use the cached RDD to do a lookup. Is this possible in Spark Streaming?

Some relevant code or pointers to relevant code will be helpful.

Upvotes: 2

Answers (3)

Aman

Reputation: 8995

If you just need a giant, distributed map, and you want to use Spark, write a standalone job that downloads the data every hours, and caches the RDD thus obtained (you can unpersist the old RDD). Let us call this Job DataRefresher.

You can then expose a REST api (if you are on Scala, consider using Scalatra) that wraps the DataRefresher, and returns the value, given the key. Something like: http://localhost:9191/lookup/key, which can be used by other jobs to do a relatively fast lookup.

Upvotes: 0

Gene Pang

Reputation: 231

Alluxio is a memory-centric distributed storage system. Alluxio can be used to cache Spark RDDs in memory, for multiple and future Spark applications and jobs to access.

Spark can store RDDs in Alluxio memory, and future Spark jobs can read them from Alluxio memory. That blog post has more details on how that works. Here is information on how to setup and configure Alluxio with Spark.

Upvotes: 3

Ayan Guha

Reputation: 750

Given your requirements, here is what I would propose:

Run a Spark Application job every 1 hour, which will get the data from external data source and append to a hive table.
Use Spark thrift server to access the data

Note: Your notion of "caching within executor to use across application" is not correct. Executors relates to single Spark App, so as any RDD within that app.

If you really need to invest on caching data on distributed nodes, you may want to consider off-heap in-memory databases, such as Tachyon and Alluxio

Upvotes: 0

Can we use cached RDD across batches on an executor

Answers (3)

Related Questions