Borislav Aymaliev
Borislav Aymaliev

Reputation: 833

Capturing the result of explain() in pyspark

In pyspark, running:

sdf = sqlContext.sql("""SELECT * FROM t1 JOIN t2 on t1.c1 = t2.c1 """)

and then:

sdf.explain(extended=True)

it prints the logical and physical plans of the query execution.

My question is: How can I capture the output in a variable, instead of printing it?

v = sdf.explain(extended=True) naturally, does not work

Upvotes: 11

Views: 13297

Answers (1)

Steven
Steven

Reputation: 15283

If you take a look at the source code of explain (version 2.4 or older), you see that :

def explain(self, extended=False):
    if extended:
        print(self._jdf.queryExecution().toString())
    else:
        print(self._jdf.queryExecution().simpleString())

Therefore, if you want to retrieve the explain plan directly, just use the method _jdf.queryExecution() on your dataframe :

v = sdf._jdf.queryExecution().toString()  # or .simpleString()

From 3.0, the code is :

print(
    self._sc._jvm.PythonSQLUtils.explainString(self._jdf.queryExecution(), explain_mode)
)

Removing the print, you get the explain as a string.

Upvotes: 22

Related Questions