Retrieve specific row number data of a column in spark dataset

Question

I have a dataset as below

+---------+
| column1 |
+---------+
| ABC     |
+---------+
| DEF     |
+---------+
| GHI     |
+---------+
| JKL     |
+---------+
| MNO     |
+---------+

Now if have to get the 4th row column value that is JKL. Is there anyway to get that directly. I normally do as below

String dataTemp = df.select("column1").collectAsList().get(3).getAs("column1").toString();

But I don't want to collect as list everytime, which can cause issues when dealing with large datasets.

pasha701 · Accepted Answer

Only limited number of rows can be collected with "take", in Scala:

val fourthRow = df.select("column1").take(4).last

If selection number is big, switch to RDD is possible:

val fourthRow = df.rdd.zipWithIndex().filter(_._2 == 4).keys.collect().head

Retrieve specific row number data of a column in spark dataset

Answers (2)

Related Questions