How to make an RDD from the first n items of another RDD in Spark?

Question

Given an RDD in pyspark, I would like to make a new RDD which only contains (a copy of) its first n items, something like:

n=100 rdd2 = rdd1.limit(n)

except RDD does not have a method limit(), like DataFrame does.

Note that I do not want to collect the result, the result must still be an RDD, therefore I cannot use RDD.take().

I am using pyspark 2.44.

Paul · Accepted Answer

You can convert the RDD to a DF limit and convert it back

rdd1.toDF().limit(n).rdd

Answers (1)