Reputation: 1023
I have a DataFrame and in that one column is has comma separated data.
For Example : Data looks like this : [{value:1}, {value:2, value:3}, {some value}, {somevalue, othervalue}]
The column is of String datatype. I want to convert it to List and apply some function. Now i have a function for doing the conversion of the String column to List & other applied logic.
But which function will be better & optimized as we have 2 similar sounding functions mapPartitions & foreachPartitions, Does it have exact same performance & in which one to use in what scenario ??
Upvotes: 5
Views: 9325
Reputation: 418
The difference is the same as that between map and foreach. Look here for good explanations - Is there a difference between foreach and map?.
mapPartitions and foreachPartitions are transformations/operations that apply to each partition of the Dataframe as opposed to each element. See here for an explanation contrasting map and mapPartitions - Apache Spark: map vs mapPartitions?.
From your description, it sounds you want either map or foreach.
Upvotes: 6