AJm
AJm

Reputation: 1023

What is the Difference between mapPartitions and foreachPartition in Apache Spark

I have a DataFrame and in that one column is has comma separated data.

For Example : Data looks like this : [{value:1}, {value:2, value:3}, {some value}, {somevalue, othervalue}]

The column is of String datatype. I want to convert it to List and apply some function. Now i have a function for doing the conversion of the String column to List & other applied logic.

But which function will be better & optimized as we have 2 similar sounding functions mapPartitions & foreachPartitions, Does it have exact same performance & in which one to use in what scenario ??

Upvotes: 5

Views: 9325

Answers (1)

xan
xan

Reputation: 418

The difference is the same as that between map and foreach. Look here for good explanations - Is there a difference between foreach and map?.

mapPartitions and foreachPartitions are transformations/operations that apply to each partition of the Dataframe as opposed to each element. See here for an explanation contrasting map and mapPartitions - Apache Spark: map vs mapPartitions?.

From your description, it sounds you want either map or foreach.

Upvotes: 6

Related Questions