Fanta
Fanta

Reputation: 3099

How to make an RDD from the first n items of another RDD in Spark?

Given an RDD in pyspark, I would like to make a new RDD which only contains (a copy of) its first n items, something like:

n=100 rdd2 = rdd1.limit(n)

except RDD does not have a method limit(), like DataFrame does.

Note that I do not want to collect the result, the result must still be an RDD, therefore I cannot use RDD.take().

I am using pyspark 2.44.

Upvotes: 1

Views: 533

Answers (1)

Paul
Paul

Reputation: 1174

You can convert the RDD to a DF limit and convert it back

rdd1.toDF().limit(n).rdd

Upvotes: 2

Related Questions