b-ryce
b-ryce

Reputation: 5828

Limit rows with Scala Spark

I have a dataset that looks right. I can see all my rows ordered correctly like this:

df1.orderBy($"count".desc)
df1.show()

But when I try and add a limit like this:

df1.orderBy($"count".desc).limit(5)
df1.show()

I'm still getting all the rows. I think I'm following the docs correctly... https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html#limit(n:Int):org.apache.spark.sql.Dataset[T]

So how do I limit?

Upvotes: 0

Views: 1130

Answers (1)

happydave
happydave

Reputation: 7187

Dataset methods return a new dataset object, they don't mutate the existing one. So you need to show the result:

df2 = df1.orderBy($"count".desc).limit(5)
df2.show()

Upvotes: 1

Related Questions