Garipaso
Garipaso

Reputation: 421

Array of Sequences in Scala

I'm trying to read distinct values column wise in a data frame and store them in a Array of sequence

def getColumnDistinctValues(df: DataFrame, colNames:String): Unit = {
  val cols: Array[String] = colNames.split(',')
  cols.foreach(println) // print column names
  var colDistValues: Array[Seq[Any]] = null
  for (i <- 0 until cols.length) {
    colDistValues(i) = df.select(cols(i)).distinct.map(x => x.get(0)).collect   // read distinct values from each column
}

The assignment to colDistValues(i) doesn't work and always results in null pointer exception, what is the correct syntax to assign it the distinct values for each column?

Regards

Upvotes: 1

Views: 420

Answers (2)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149518

You're trying to access the ith index of a null pointer (which you assign yourself), of course you'll get a NullPointerException. You don't need to initialize an Array[T] beforehand, let the returned collection do that for you:

val colDistValues: Array[Array[Any]] = 
  cols.map(c => df.select(c).distinct.map(x => x.get(0)).collect)

Upvotes: 5

user7220187
user7220187

Reputation:

You are initialising the colDistValues to null.

Replace

var colDistValues: Array[Seq[Any]] = null

with

var colDistValues: Array[Seq[Any]] = Array.ofDim[Seq[Any]](cols.length)

Upvotes: 2

Related Questions