Spark throwing Out of Memory error

Question

I have a single test node with 8 GB ram on which I am loading barely 10 MB of data(from csv files) into Cassandra(on the same node itself). Im trying to process this data using spark(running on the same node).

Please note that for SPARK_MEM, Im allocating 1 GB of RAM and SPARK_WORKER_MEMORY I'm allocating the same. The allocation of any extra amount of memory results in spark throwing a "Check if all workers are registered and have sufficient memory error", which is more often than not indicative of Spark trying to look for extra memory(as per SPARK_MEM and SPARK_WORKER_MEMORY properties) and coming up short.

When I try to load and process all data in the Cassandra table using spark context object, I'm getting an error during processing. So, I'm trying to use a looping mechanism to read chunks of data at a time from one table, process them and put them in another table.

My source code has the following structure

var data=sc.cassandraTable("keyspacename","tablename").where("value=?",1)
data.map(x=>tranformFunction(x)).saveToCassandra("keyspacename","tablename")

for(i<-2 to 50000){
    data=sc.cassandraTable("keyspacename","tablename").where("value=?",i)
    data.map(x=>tranformFunction(x)).saveToCassandra("keyspacename","tablename")    
}

Now, this works for a while, for around 200 loops, and then this throws an error: java.lang.OutOfMemoryError: unable to create a new native thread.

I've got two questions:

Is this the right way to deal with data?
How can processing just 10 MB of data do this to a cluster?

Spark throwing Out of Memory error

Answers (1)

Related Questions