Katya Willard
Katya Willard

Reputation: 2182

Does count() cause map() code to execute in Spark?

So, I know that Spark is a lazy executor. For example, if I call

post = pre.filter(lambda x: some_condition(x)).map(lambda x: do_something(x))

I know that it won't immediately execute.

But what happens to the above code when I call post.count()? I imagine the filtering would be forced into execution, since pre and post will likely not have the same number of rows since there is a filter condition there. However, map is a 1-to-1 relationship, so the count would not be affected by it. Would the map command be executed here given the count()?

Follow up: When I want to force execution of map statements (assuming count() doesn't work), what can I call to force execution? I'd prefer to not have to use saveAsTextFile().

Upvotes: 0

Views: 924

Answers (1)

zero323
zero323

Reputation: 330093

count will execute all transformations in the lineage unless some stages can be fetched from cache. It means that every transformations will be executed at least once so along as you don't depend on some kind of side effects triggered by some_condition or do_something it should work just fine.

Upvotes: 6

Related Questions