Reputation: 193
I have a DataFrame which contains several records,
I want to iterate each row of this DataFrame in order to validate the data of each of its columns, doing something like the following code:
val validDF = dfNextRows.map {
x => ValidateRow(x)
}
def ValidateRow(row: Row) : Boolean = {
val nC = row.getString(0)
val si = row.getString(1)
val iD = row.getString(2)
val iH = row.getString(3)
val sF = row.getString(4)
// Stuff to validate the data field of each row
validateNC(nC)
validateSI(SI)
validateID(ID)
validateIF(IF)
validateSF(SF)
true
}
But, doing some tests, if I want to print the value of the val nC (to be sure that I'm sending the corret information to each functions), it doesn't bring me anything:
def ValidateRow(row: Row) : Boolean = {
val nC = row.getString(0)
val si = row.getString(1)
val iD = row.getString(2)
val iH = row.getString(3)
val sF = row.getString(4)
println(nC)
validateNC(nC)
validateSI(SI)
validateID(ID)
validateIF(IF)
validateSF(SF)
true
}
How can I know that I'm sending the correct information to each function (that I'm reading the data of each column of the row correctly)?
Regards.
Upvotes: 1
Views: 7388
Reputation: 41957
Spark dataframe function should give you a good start.
If your validate functions are simple enough (like checking for null values), then you can embed the functions as
dfNextRows.withColumn("num_cta", when(col("num_cta").isNotNull, col("num_cta").otherwise(lit(0)) ))
You can do the same for other columns in the same manner just by using appropriate spark dataframe functions
If your validation rules are complex then you can use udf
functions as
def validateNC = udf((num_cta : Long)=> {
//define your rules here
})
You can call the udf
function using withColumn
as
dfNextRows.withColumn("num_cta", validateNC(col("num_cta")))
You can do so for your rest of the validate rules.
I hope to see your problem get resolved soon
Upvotes: 3