cozyss
cozyss

Reputation: 1388

What's the difference between Dataset.map(r=>xx) and Dataframe.map(r=>xx) in Spark2.0?

Some how in Spark2.0, I can use Dataframe.map(r => r.getAs[String]("field")) without problems

But DataSet.map(r => r.getAs[String]("field")) gives error that r doesn't have the "getAs" method.

What's the difference between r in DataSet and r in DataFrame and why r.getAs only works with DataFrame?

After doing some research in StackOverflow, I found a helpful answer here

Encoder error while trying to map dataframe row to updated row

Hope it's helpful

Upvotes: 0

Views: 66

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37822

Dataset has a type parameter: class Dataset[T]. T is the type of each record in the Dataset. That T might be anything (well, anything for which you can provide an implicit Encoder[T], but that's besides the point).

A map operation on a Dataset applies the provided function to each record, so the r in the map operations you showed will have the type T.

Lastly, DataFrame is actually just an alias for Dataset[Row], which means each record has the type Row. And Row has a method named getAs that takes a type parameter and a String argument, hence you can call getAs[String]("field") on any Row. For any T that doesn't have this method - this will fail to compile.

Upvotes: 4

Related Questions