CClarke
CClarke

Reputation: 576

Why shouldn't local mode in Spark be used for production?

The official documentation and all sorts of books and articles repeat the recommendation that Spark in local mode should not be used for production purposes. Why not? Why is it a bad idea to run a Spark application on one machine for production purposes? Is it simply because Spark is designed for distributed computing and if you only have one machine there are much easier ways to proceed?

Upvotes: 3

Views: 1281

Answers (3)

Alex Gidiotis
Alex Gidiotis

Reputation: 11

I agree that this is largely ignored in official documentation but there are actually some benefits of running Spark even in local mode (e.g. instead of pure python, scala etc). There's a great resource and benchmark with more details here.

In summary the main advantages:

  • A single, unified API that scales from “small data” on a laptop to “‘big data” on a cluster.
  • Spark can often be faster, due to parallelism, than single-node PyData tools.
  • Spark can have lower memory consumption and can process more data than laptop ’s memory size, as it does not require loading the entire data set into memory before processing.
  • Offers a number of abstractions as well as fault tolerance for parallel computing.
  • Offers a number of algorithms and functions that are optimized for parallel computing.

Upvotes: 1

Vishal
Vishal

Reputation: 1492

Local mode in Apache Spark is intended for development and testing purposes, and should not be used in production because:

  • Scalability: Local mode only uses a single machine, so it cannot handle large data sets or handle the processing needs of a production environment.
  • Resource Management: Spark’s standalone cluster manager or a cluster manager like YARN, Mesos, or Kubernetes provides more advanced resource management capabilities for production environments compared to local mode.
  • Fault Tolerance: Local mode does not have the ability to recover from failures, while a cluster manager can provide fault tolerance by automatically restarting failed tasks on other nodes.
  • Security: Spark’s cluster manager provides built-in security features such as authentication and authorization, which are not present in local mode.

Therefore, it is recommended to use a cluster manager for production environments to ensure scalability, resource management, fault tolerance, and security.

Upvotes: 2

Shady el Gewily
Shady el Gewily

Reputation: 1

I have the same question. I am certainly not an authority on the subject, but because no-one has answered this question, I'll try to list the reasons I've encountered while using Spark local mode in Java. So far:

Upvotes: 0

Related Questions