Reputation: 10303
Does Spark 2.4 have Vector and Matrix classes that support basic linear algebra operations like dot product, norm, matrix and vector multiplication? I can't find any linear algebra support in classes like Vector, DenseVector, or RowMatrix.
Older versions of Spark had org.jblas.DoubleMatrix, but that doesn't exist in Spark 2.4 and I can't find what they replaced it with.
Where do I look for linear algebra examples in spark 2.4?
I don't need RDDs for my current need (cosine similarity).
Upvotes: 3
Views: 1533
Reputation: 573
Adding to Daniel Sobrado good response, spark 2.4 also comes with Breeze support Breeze Linear Algebra
This library has the benefits that matrices default to column major ordering, like Matlab, but indexing is 0-based, like Numpy.
Breeze supports indexing and slicing, linear algebra functions
(Linear solve, transpose, Determinant, Inverse, Eigenvalues , Eigenvectors, Singular Value Decomposition)
and operations (Vector dot product, Elementwise addition, Shaped/Matrix multiplication, Elementwise multiplication, Elementwise max, Elementwise argmax), etc.
Its to be noted that Breeze uses netlib-java for its core linear algebra routines
Below is an example of scala code that uses Breeze
import breeze.linalg.DenseVector
import com.github.fommil.netlib.BLAS
import org.slf4j.LoggerFactory
object Breeze1 {
def main(args:Array[String]): Unit = {
println("Init logging...")
System.setProperty(org.slf4j.impl.SimpleLogger.DEFAULT_LOG_LEVEL_KEY, "TRACE");
val log = LoggerFactory.getLogger("main")
log.trace("Starting...")
val b = BLAS.getInstance()
log.trace(s"BLAS = $b")
val v = DenseVector(1,2,3,4)
log.trace("Ending.")
}
}
Upvotes: 2
Reputation: 747
I've found that JBlas was removed due to incompatible license and replaced with netlib-java. You might want to look into it, this is a wrapper for low-level BLAS, LAPACK and ARPACK.
MLLib has the capabilities for Dense and Sparse Vectors/Matrices, they are based on RDDs: (I understand that you are looking for the low-level implementation)
For Vectors and Matrices you can use: org.apache.spark.mllib.linalg.{Vector, Vectors, Matrix, Matrices} these have support for dense and sparse vectors and matrices.
RowMatrix will be: org.apache.spark.mllib.linalg.distributed.RowMatrix
You can refer to the documentation: https://spark.apache.org/docs/latest/mllib-data-types.html
In fact, you can find the cosine similarity implementation in the MLlib repo: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala
Upvotes: 3