Reputation: 1388
Some how in Spark2.0, I can use Dataframe.map(r => r.getAs[String]("field"))
without problems
But DataSet.map(r => r.getAs[String]("field"))
gives error that r
doesn't have the "getAs" method.
What's the difference between r
in DataSet
and r
in DataFrame
and why r.getAs
only works with DataFrame
?
After doing some research in StackOverflow, I found a helpful answer here
Encoder error while trying to map dataframe row to updated row
Hope it's helpful
Upvotes: 0
Views: 66
Reputation: 37822
Dataset
has a type parameter: class Dataset[T]
. T
is the type of each record in the Dataset. That T
might be anything (well, anything for which you can provide an implicit Encoder[T]
, but that's besides the point).
A map
operation on a Dataset
applies the provided function to each record, so the r
in the map operations you showed will have the type T
.
Lastly, DataFrame
is actually just an alias for Dataset[Row]
, which means each record has the type Row
. And Row
has a method named getAs
that takes a type parameter and a String argument, hence you can call getAs[String]("field")
on any Row
. For any T
that doesn't have this method - this will fail to compile.
Upvotes: 4