John Humanyun
John Humanyun

Reputation: 945

Retrieve specific row number data of a column in spark dataset

I have a dataset as below

+---------+
| column1 |
+---------+
| ABC     |
+---------+
| DEF     |
+---------+
| GHI     |
+---------+
| JKL     |
+---------+
| MNO     |
+---------+

Now if have to get the 4th row column value that is JKL. Is there anyway to get that directly. I normally do as below

String dataTemp = df.select("column1").collectAsList().get(3).getAs("column1").toString();

But I don't want to collect as list everytime, which can cause issues when dealing with large datasets.

Upvotes: 1

Views: 1586

Answers (2)

pasha701
pasha701

Reputation: 7207

Only limited number of rows can be collected with "take", in Scala:

val fourthRow = df.select("column1").take(4).last

If selection number is big, switch to RDD is possible:

val fourthRow = df.rdd.zipWithIndex().filter(_._2 == 4).keys.collect().head

Upvotes: 2

blackbishop
blackbishop

Reputation: 32640

Use row_number to assign each row an index and then select row with rn = 4:

import org.apache.spark.sql.expressions.Window

val row  = df.withColumn("rn", row_number().over(Window.orderBy(lit(1))))
             .filter("rn = 4")
             .select($"column1").first

Upvotes: 1

Related Questions