Reputation: 421
I'm trying to read distinct values column wise in a data frame and store them in a Array of sequence
def getColumnDistinctValues(df: DataFrame, colNames:String): Unit = {
val cols: Array[String] = colNames.split(',')
cols.foreach(println) // print column names
var colDistValues: Array[Seq[Any]] = null
for (i <- 0 until cols.length) {
colDistValues(i) = df.select(cols(i)).distinct.map(x => x.get(0)).collect // read distinct values from each column
}
The assignment to colDistValues(i) doesn't work and always results in null pointer exception, what is the correct syntax to assign it the distinct values for each column?
Regards
Upvotes: 1
Views: 420
Reputation: 149518
You're trying to access the ith index of a null pointer (which you assign yourself), of course you'll get a NullPointerException
. You don't need to initialize an Array[T]
beforehand, let the returned collection do that for you:
val colDistValues: Array[Array[Any]] =
cols.map(c => df.select(c).distinct.map(x => x.get(0)).collect)
Upvotes: 5
Reputation:
You are initialising the colDistValues
to null.
Replace
var colDistValues: Array[Seq[Any]] = null
with
var colDistValues: Array[Seq[Any]] = Array.ofDim[Seq[Any]](cols.length)
Upvotes: 2