Reputation: 3099
Given an RDD in pyspark, I would like to make a new RDD which only contains (a copy of) its first n items, something like:
n=100
rdd2 = rdd1.limit(n)
except RDD does not have a method limit()
, like DataFrame does.
Note that I do not want to collect the result, the result must still be an RDD, therefore I cannot use RDD.take()
.
I am using pyspark 2.44.
Upvotes: 1
Views: 533
Reputation: 1174
You can convert the RDD to a DF limit and convert it back
rdd1.toDF().limit(n).rdd
Upvotes: 2