knowone
knowone

Reputation: 840

get multiple columns within a map: rdd

I've a DF that I'm explicitly converting into an RDD and trying to fetch each column's record. Not able to fetch each of them within a map. Below is what I've tried:

val df = sql("Select col1, col2, col3, col4, col5 from tableName").rdd

The resultant df becomes the member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]

Now I'm trying to access each element of this RDD via:

val dfrdd = df.map{x => x.get(0); x.getAs[String](1); x.get(3)}

The issue is, the above statement returns only the data present on the last transformation of map i.e., the data present on x.get(3). Can someone let me know what I'm doing wrong?

Upvotes: 3

Views: 6860

Answers (1)

koiralo
koiralo

Reputation: 23109

The last line is always returned from the map, In your case x.get(3) gets returned.

To return multiple values you can return tuples as below

val dfrdd = df.map{x => (x.get(0), x.getAs[String](1), x.get(3))}

Hope this helped!

Upvotes: 5

Related Questions