Dean Schulze
Dean Schulze

Reputation: 10303

Doing basic linear algebra in Spark 2.4

Does Spark 2.4 have Vector and Matrix classes that support basic linear algebra operations like dot product, norm, matrix and vector multiplication? I can't find any linear algebra support in classes like Vector, DenseVector, or RowMatrix.

Older versions of Spark had org.jblas.DoubleMatrix, but that doesn't exist in Spark 2.4 and I can't find what they replaced it with.

Where do I look for linear algebra examples in spark 2.4?

I don't need RDDs for my current need (cosine similarity).

Upvotes: 3

Views: 1533

Answers (2)

Gabriel Hernandez
Gabriel Hernandez

Reputation: 573

Adding to Daniel Sobrado good response, spark 2.4 also comes with Breeze support Breeze Linear Algebra This library has the benefits that matrices default to column major ordering, like Matlab, but indexing is 0-based, like Numpy. Breeze supports indexing and slicing, linear algebra functions (Linear solve, transpose, Determinant, Inverse, Eigenvalues , Eigenvectors, Singular Value Decomposition) and operations (Vector dot product, Elementwise addition, Shaped/Matrix multiplication, Elementwise multiplication, Elementwise max, Elementwise argmax), etc. Its to be noted that Breeze uses netlib-java for its core linear algebra routines Below is an example of scala code that uses Breeze

import breeze.linalg.DenseVector
import com.github.fommil.netlib.BLAS
import org.slf4j.LoggerFactory

object Breeze1 {
  def main(args:Array[String]): Unit = {
    println("Init logging...")
    System.setProperty(org.slf4j.impl.SimpleLogger.DEFAULT_LOG_LEVEL_KEY, "TRACE");
    val log = LoggerFactory.getLogger("main")
    log.trace("Starting...")
    val b = BLAS.getInstance()
    log.trace(s"BLAS = $b")
    val v = DenseVector(1,2,3,4)
    log.trace("Ending.")
  }
}

Upvotes: 2

Daniel Sobrado
Daniel Sobrado

Reputation: 747

I've found that JBlas was removed due to incompatible license and replaced with netlib-java. You might want to look into it, this is a wrapper for low-level BLAS, LAPACK and ARPACK.

MLLib has the capabilities for Dense and Sparse Vectors/Matrices, they are based on RDDs: (I understand that you are looking for the low-level implementation)

For Vectors and Matrices you can use: org.apache.spark.mllib.linalg.{Vector, Vectors, Matrix, Matrices} these have support for dense and sparse vectors and matrices.

RowMatrix will be: org.apache.spark.mllib.linalg.distributed.RowMatrix

You can refer to the documentation: https://spark.apache.org/docs/latest/mllib-data-types.html

In fact, you can find the cosine similarity implementation in the MLlib repo: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala

Upvotes: 3

Related Questions