Biarys
Biarys

Reputation: 1173

What's the difference between element and partition in Spark?

I tried googling, but couldn't find an answer.

Taken from Apache Spark: map vs mapPartitions?

What's the difference between an RDD's map and mapPartitions

map works the function being utilized at a per element level while mapPartitions exercises the function at the partition level.

In this context, what is element level? Is it just an individual row?

Upvotes: 2

Views: 189

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29155

In layman's terms you have a shelf with 10 racks and you have 100 balls like shown in picture. You will adjust 10 balls in 1 rack like wise.. 100 balls in 10 racks. is balldata.repartition(10)... thus uniformly distributed data(rather putting all 100 in one or 2 rack )

Now instead of applying any logic on each ball (element or row), you are going to apply logic on each rack (partition) once. is the difference.

In this case element is ball (a single row) and Partition is rack.

Advantage would be, if you are doing heavy initialization like opening database connections etc... for your processing logic... you will open one connection per partition (Rack :-)) to apply your logic rather than opening database connection for each element (Ball :-))

I advise you to go through the examples given there to understand better

enter image description here

courtesy/credits for image here

Upvotes: 3

Related Questions