Reputation: 945
I have a dataset as below
+---------+
| column1 |
+---------+
| ABC |
+---------+
| DEF |
+---------+
| GHI |
+---------+
| JKL |
+---------+
| MNO |
+---------+
Now if have to get the 4th row column value that is JKL
. Is there anyway to get that directly.
I normally do as below
String dataTemp = df.select("column1").collectAsList().get(3).getAs("column1").toString();
But I don't want to collect as list everytime, which can cause issues when dealing with large datasets.
Upvotes: 1
Views: 1586
Reputation: 7207
Only limited number of rows can be collected with "take", in Scala:
val fourthRow = df.select("column1").take(4).last
If selection number is big, switch to RDD is possible:
val fourthRow = df.rdd.zipWithIndex().filter(_._2 == 4).keys.collect().head
Upvotes: 2
Reputation: 32640
Use row_number
to assign each row an index and then select row with rn = 4
:
import org.apache.spark.sql.expressions.Window
val row = df.withColumn("rn", row_number().over(Window.orderBy(lit(1))))
.filter("rn = 4")
.select($"column1").first
Upvotes: 1