Reputation: 37
I use Spark (in java) to create a RDD of complex object. Is it possible to save permently this object in memory to use again this object with spark in the future ?
(Because Spark after a application or a job clean memory)
Upvotes: 1
Views: 2117
Reputation: 16999
Spark is not intended as a permanent storage, you can use HDFS, ElasticSearch or another 'Spark compatible' cluster storage for this.
Spark reads data from a cluster storage, does some work in random access memory RAM (and optional caching of temp results), then usually writes results back to cluster storage because there may be too many results for the local hard drive.
Example: Read from HDFS -> Spark ... RDD ... -> Store results in HDFS
You must distinguish between slow storage like harddrives (disk, SSD) and fast volatile memory like RAM. The strength of Spark is making heavy use of random access memory (RAM).
You may use caching, for a temporary storage, see: (Why) do we need to call cache or persist on a RDD
Upvotes: 3