Why Apache Ignite (version 2.3.0) Spark SharedRDD forgot some (20-30%) of entries

Question

I have a Spark Cluster in Docker with 1 Master and 2 Workers. On every worker an Apache Ignite Software running.

If I open spark-shell and execute the following command( where I open the cache store some value and read out the data from the cache):

import org.apache.ignite.spark._
import org.apache.ignite.configuration._
val ic = new IgniteContext(sc, () => new IgniteConfiguration())
val sharedRDD: IgniteRDD[Integer, Integer] = ic.fromCache[Integer, Integer]("partitioned")
sharedRDD.savePairs(sc.parallelize(1 to 100, 10).map(i => (i, i)))
sharedRDD.count

then I receive:

res3: Long = 100

If I execute sharedRDD.collect().foreach(println), there are every number pairs in the list unil 100

(1,1)
(2,2)
(3,3)
(4,4)
(5,5)
(6,6)
(7,7)
(8,8)
(9,9)
(10,10)
...
(100,100)

It's perfect.

BUT When I quit with sys.exit and reopen the spark-shell again, and execute the following code (where I read out the data from the cache):

import org.apache.ignite.spark._
import org.apache.ignite.configuration._
val ic = new IgniteContext(sc, () => new IgniteConfiguration())
val sharedRDD: IgniteRDD[Integer, Integer] = ic.fromCache[Integer, Integer]("partitioned")
sharedRDD.count
sharedRDD.collect().foreach(println)

Then the result is

res0: Long = 60

and some number pairs are missing. (for example 4,9,10)

(1,1)
(2,2)
(3,3)
(5,5)
(6,6)
(7,7)
(8,8)
(11,11)
(12,12)
(13,13)
(14,14)
(15,15)
...

Has anybody an idea why this happen?

Valentin Kulichenko · Accepted Answer

There is an issue that causes nodes embedded into executors started in server mode [1], most likely this is the reason. As a workaround you can force IgniteContext to start everything in client mode:

val ic = new IgniteContext(sc, () => new IgniteConfiguration().setClientMode(true))

Of course, this assumes that you start in a standalone mode with a separately running Ignite cluster.

[1] https://issues.apache.org/jira/browse/IGNITE-5981

Why Apache Ignite (version 2.3.0) Spark SharedRDD forgot some (20-30%) of entries

Answers (1)

Related Questions