Reputation: 5828
I have a dataset that looks right. I can see all my rows ordered correctly like this:
df1.orderBy($"count".desc)
df1.show()
But when I try and add a limit like this:
df1.orderBy($"count".desc).limit(5)
df1.show()
I'm still getting all the rows. I think I'm following the docs correctly... https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html#limit(n:Int):org.apache.spark.sql.Dataset[T]
So how do I limit?
Upvotes: 0
Views: 1130
Reputation: 7187
Dataset methods return a new dataset object, they don't mutate the existing one. So you need to show the result:
df2 = df1.orderBy($"count".desc).limit(5)
df2.show()
Upvotes: 1