Shu
Shu

Reputation: 195

Is it a good practice to incur Spark show() method in production Spark job?

Using DataFrame.show() API, we can take a glance about the underlying data.

Is it good to use this method in production spark job?

Basically, I know we can comment this kind of code before launching the job, but if we just keep it, is it a good practice?
Or it will cause performance issue?

Upvotes: 1

Views: 1379

Answers (3)

Palak Singhal
Palak Singhal

Reputation: 101

No, it's not a good method. Spark is a lazy evaluator which implies that execution won't start until necessary. It creates a Directed Acyclic Graph to keep track of the requests in order. However, it won't execute anything until an action is called. Hence unnecessary calling actions like show should be avoided.

Upvotes: 2

Nikunj Kakadiya
Nikunj Kakadiya

Reputation: 3008

show() command is an action, so we should not use that in our production code as it would materialize your code unnecessary and ultimately slow down your job to an extent.

Upvotes: 0

Yaron
Yaron

Reputation: 10450

The show() command is an action.

Adding unnecessary action to the code, might disturb Spark optimizer, as the optimizer can change the order of the transformation, but should trigger an action every time their is an action.
i.e. Using unnecessary action limits the optimizer work.

See Actions vs Transformations

Upvotes: 4

Related Questions