Bryan K.
Bryan K.

Reputation: 165

How to Handle ValueError from Dataframe use Scala

I am developing Spark using Scala, and I don't have any background of Scala. I don't get the ValueError Yet, but I am preparing the ValueError Handler for my code.

|location|arrDate|deptDate|
|JFK     |1201   |1209    |
|LAX     |1208   |1212    |
|NYC     |       |1209    |
|22      |1201   |1209    |
|SFO     |1202   |1209    |

If we have data like this, I would like to store Third row and Fourth row into Error.dat then process the fifth row again. In the error log, I would like to put the information of the data such as which file, the number of the row, and details of error. For logger, I am using log4j now.

What is the best way to implement that function? Can you guys help me?

Upvotes: 0

Views: 153

Answers (1)

rogue-one
rogue-one

Reputation: 11587

I am assuming all the three columns are type String. in that case I would solve this using the below snippet. I have created two udf to check for the error records.

  • if a field is has only numeric characters [isNumber]
  • and if the string field is empty [isEmpty]

code snippet

 import org.apache.spark.sql.functions.row_number
 import org.apache.spark.sql.expressions.Window
 import org.apache.spark.sql.functions.udf

 val df = rdd.zipWithIndex.map({case ((x,y,z),index) => (index+1,x,y,z)}).toDF("row_num", "c1", "c2", "c3")
 val isNumber = udf((x: String) => x.replaceAll("\\d","") == "")
 val isEmpty = udf((x: String) => x.trim.length==0)
 val errDF = df.filter(isNumber($"c1") || isEmpty($"c2"))
 val validDF = df.filter(!(isNumber($"c1") || isEmpty($"c2")))


scala> df.show()
+-------+---+-----+-----+
|row_num| c1|   c2|   c3|
+-------+---+-----+-----+
|      1|JFK| 1201| 1209|
|      2|LAX| 1208| 1212|
|      3|NYC|     | 1209|
|      4| 22| 1201| 1209|
|      5|SFO| 1202| 1209|
+-------+---+-----+-----+

scala> errDF.show()
+-------+---+----+----+
|row_num| c1|  c2|  c3|
+-------+---+----+----+
|      3|NYC|    |1209|
|      4| 22|1201|1209|
+-------+---+----+----+

Upvotes: 1

Related Questions