Reputation: 3594
I'm using the Typelevel Frameless library to introduced strongly typed Datasets into some legacy code, which used it previously with no problems. I'm using the 0.4.1 version of Frameless, using Spark 2.2, and running the code on EMR, interacting with a Hive DB. I try running this code, and it appears that running deserialized.map
on a TypedDataset seems to not execute, and invalidate the data within the Dataset by setting it all to null. I thought it was a problem with the encoder, but the filter
function within the same block seems to run fine:
def generate(
combinedDS: TypedDataset[MyDataClass],
isControl: Boolean
): TypedDataset[MyMappedDataClass] = {
val filteredCombinedDS = combinedDS.filter(combinedDS('is_control) === isControl)
println(s"After filter count: ${filteredCombinedDS.count().run()}") // count is 1000
println("after the combined: ")
filteredCombinedDS.show(6).run() // runs fine, shows valid rows/columns with data present
val resultDS = filteredCombinedDS.deserialized.map { row =>
throw new Exception(s"Error out on the first row, show me the row: ${row}") // never occurs
// ... simple 1 to 1 mappings extract some fields in MyDataClass to a MyMappedDataClass
//it returns the same numbers of rows that filteredCombinedDS has
// but all rows/columns are just filled with nulls
MyMappedDataClass(
row.is_control,
row.customerId,
// etc...
)
}
println(s"Count after map: ${resultDS.count().run()}") // count is 1000
resultDS.show(6).run() // runs, shows schema of MyMappedDataClass, but all rows/columns have 'null' values
resultDS
}
Does anyone know what would cause the map
function to just skip running? I've even tried to force an exception to be thrown in the map
function itself just to prove that it isn't running. What would cause the map
portion to just be glossed over?
Upvotes: 2
Views: 81