How to apply kmeans for parquet file?

Question

I want to apply k-means for my parquet file.but error appear .

edited

java.lang.ArrayIndexOutOfBoundsException: 2

code

val Data = sqlContext.read.parquet("/usr/local/spark/dataset/norm")
val parsedData = Data.rdd.map(s => Vectors.dense(s.getDouble(1),s.getDouble(2))).cache()

import org.apache.spark.mllib.clustering.KMeans 
val numClusters = 30
val numIteration = 1
 val userClusterModel = KMeans.train(parsedData, numClusters, numIteration)
val userfeature1 = parsedData.first 
val userCost = userClusterModel.computeCost(parsedData)
println("WSSSE for users: " + userCost)

How to solve this error?

Salma Elzeheiry · Accepted Answer

    val parsedData = Data.rdd.map(s => Vectors.dense(s.getInt(0),s.getDouble(1))).cache()

How to apply kmeans for parquet file?

Answers (2)

Related Questions