Rabjot Singh
Rabjot Singh

Reputation: 11

What's the difference between sparkML and systemML?

What's the difference between spark ML and system ML ? There is Problem solved by both system ml and spark ml in apache spark engine on IBM, want to know what the main difference ?

Upvotes: 1

Views: 432

Answers (1)

Rough Manly
Rough Manly

Reputation: 25

Apache Spark is a distributed, data-parallel framework with rich primitives such as map, reduce, join, filter, etc. Additionally, it powers a stack of “libraries” including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming.

Apache SystemML is a flexible, scalable machine learning (ML) system, enabling algorithm customization and automatic optimization. SystemML’s distinguishing characteristics are:

Algorithm customizability via R-like and Python-like languages. Multiple execution modes, including Spark MLContext, Spark Batch, Hadoop Batch, Standalone, and JMLC. Automatic optimization based on data and cluster characteristics to ensure both efficiency and scalability. A more useful comparison would be Spark’s MLLib and SystemML:

Like MLLib, SystemML is can run on top of Apache Spark [batch-mode or programmatic-api]. But unlike MLLib (which has a fixed runtime plan), SystemML has an optimizer that adapts the runtime plan based on the input data and cluster characteristics. Both MLLib and SystemML can accept the input data as Spark’s DataFrame. MLLib’s algorithms are written in Scala using Spark’s primitives. At high-level, there are two users of MLLib: (1) Expert developers who implement their algorithm in Scala and have deep understanding of Spark’s core. (2) Non-expert data-scientists who wants to use MLLib as black-box and tweak the hyperparameters. Both these users heavily rely on the initial assumptions of the input data and cluster characteristics. If those assumptions are not valid for a given use-case in production, the user can get poor performance or even OOM. SystemML’s algorithms are implemented using a high-level (linear algebra friendly) language and its optimizer dynamically compiles the runtime plan based on the input data and cluster characteristics. To simpify the usage, SystemML comes with bunch of pre-implemented algorithms along with MLLib-like wrappers. Unlike MLLib, SystemML’s algorithm can be used on other backends: such as Apache Hadoop, Embedded In-memory, GPU and may be in future Apache Flink. Examples of machine learning systems with cost-based optimizer (similar to SystemML): Mahout Samsara, Tupleware, Cumulon, Dmac and SimSQL. Examples of machine learning library with a fixed plan (similar to MLLib): Mahout MR, MADlib, ORE, Revolution R and HP’s Distributed R. Examples of distributed systems with domain specific languages (similar to Apache Spark): Apache Flink, REEF and GraphLab.

Upvotes: 0

Related Questions