Spark(PySpark) how to synchronize multiple worker nodes updating RDD

Question

I have three matrices (A, B and C) as individual RDDs and I need to partition them among worked nodes, as blocks of matrices. The action I perform needs to update the blocks of matrices but I need to synchronize on the blocks of matrices so that two worker nodes dont update the same matrix block at the same time. How can I achieve this synchronization. Is there mechanism for locking? I am very new to Spark (PySpark).

Is it possible to control how partitioning is done by Spark, i.e. to control which block is sent to which worked node?

Please help.

Spark(PySpark) how to synchronize multiple worker nodes updating RDD

Answers (1)

Related Questions