Aviral Kumar
Aviral Kumar

Reputation: 824

Storing Data in Spark In Memory

I have got a requirement of keeping the data in Spark's in memory in table format even when the SparkContext object dies, so that Tableau can access it.

I have used registerTempTable , but data gets removed once the SparkContext object dies. Is it possible to store data like this?If not what possible way I can look into to feed data to Tableau without reading it from HDFS location.

Upvotes: 0

Views: 1517

Answers (3)

Aviral Kumar
Aviral Kumar

Reputation: 824

I came to know a very interesting answer to the question asked above. TACHYON. http://ampcamp.berkeley.edu/5/exercises/tachyon.html

Upvotes: 0

Roman Kagan
Roman Kagan

Reputation: 464

Does Tableau read data from custom Spark Application?

I use PowerBi (instead Tableau) and it queries Spark through Thrift client, so each time it dies and restarts, I send him "cache table myTable" query through odbc/jdbc driver

Upvotes: 0

Mateusz Dymczyk
Mateusz Dymczyk

Reputation: 15141

You will need to do one of the below:

  1. run your Spark application as a long running application. Spark streaming usually does that out of the box (when you do StreamingContext.awaitTermination()). I have never tried it myself but I think YARN and MESOS have support for long running tasks. As you mentioned whenever your SparkContext dies, all the data is lost (because all the information is stored in the context). I consider spark-shell a long running application, that's why most Tableau/Spark demos use it because the context never dies.
  2. store it into a data store (HDFS, database, etc.)
  3. Try to use some distributed in-memory framework/file system like Tachyon - not sure if it has Tableau connectors, though.

Upvotes: 2

Related Questions