Reputation: 2182
So, I know that Spark is a lazy executor. For example, if I call
post = pre.filter(lambda x: some_condition(x)).map(lambda x: do_something(x))
I know that it won't immediately execute.
But what happens to the above code when I call post.count()
? I imagine the filtering would be forced into execution, since pre
and post
will likely not have the same number of rows since there is a filter
condition there. However, map
is a 1-to-1 relationship, so the count would not be affected by it. Would the map
command be executed here given the count()
?
Follow up: When I want to force execution of map
statements (assuming count()
doesn't work), what can I call to force execution? I'd prefer to not have to use saveAsTextFile()
.
Upvotes: 0
Views: 924
Reputation: 330093
count
will execute all transformations in the lineage unless some stages can be fetched from cache. It means that every transformations will be executed at least once so along as you don't depend on some kind of side effects triggered by some_condition
or do_something
it should work just fine.
Upvotes: 6