Reputation: 61
1) Initially filtered RDD with null values.
val rddWithOutNull2 = rddSlices.filter(x => x(0) != null)
2) Then converted this RDD to RDD of Row
3) After converting RDD to Dataframe using Scala :
val df = spark.createDataFrame(rddRow,schema)
df.printSchema()
Output:
root
|-- name: string (nullable = false)
println(df.count())
Output:
Error :
count : :
[Stage 11:==================================> (3 + 2) / 5][error] o.a.s.e.Executor - Exception in task 4.0 in stage 11.0 (TID 16)
java.lang.IndexOutOfBoundsException: 0
Upvotes: 1
Views: 950
Reputation: 7928
Agree with the comments, the problem seems to be in x(0). If there is an empty row, it will throw that Exception
. One solution (depending on the type of the variable x
) is to retrieve it with a headOption
val rddWithOutNull2 = rddSlices.filter(_.headOption.isDefined)
Upvotes: 1