Reputation: 195
Using DataFrame.show()
API, we can take a glance about the underlying data.
Is it good to use this method in production spark job?
Basically, I know we can comment this kind of code before launching the job, but if we just keep it, is it a good practice?
Or it will cause performance issue?
Upvotes: 1
Views: 1379
Reputation: 101
No, it's not a good method. Spark is a lazy evaluator which implies that execution won't start until necessary. It creates a Directed Acyclic Graph to keep track of the requests in order. However, it won't execute anything until an action is called. Hence unnecessary calling actions like show
should be avoided.
Upvotes: 2
Reputation: 3008
show() command is an action, so we should not use that in our production code as it would materialize your code unnecessary and ultimately slow down your job to an extent.
Upvotes: 0
Reputation: 10450
The show()
command is an action.
Adding unnecessary action to the code, might disturb Spark optimizer, as the optimizer can change the order of the transformation, but should trigger an action every time their is an action.
i.e. Using unnecessary action limits the optimizer work.
See Actions vs Transformations
Upvotes: 4