Reputation: 833
In pyspark, running:
sdf = sqlContext.sql("""SELECT * FROM t1 JOIN t2 on t1.c1 = t2.c1 """)
and then:
sdf.explain(extended=True)
it prints the logical and physical plans of the query execution.
My question is: How can I capture the output in a variable, instead of printing it?
v = sdf.explain(extended=True)
naturally, does not work
Upvotes: 11
Views: 13297
Reputation: 15283
If you take a look at the source code of explain
(version 2.4 or older), you see that :
def explain(self, extended=False):
if extended:
print(self._jdf.queryExecution().toString())
else:
print(self._jdf.queryExecution().simpleString())
Therefore, if you want to retrieve the explain plan directly, just use the method _jdf.queryExecution()
on your dataframe :
v = sdf._jdf.queryExecution().toString() # or .simpleString()
From 3.0, the code is :
print(
self._sc._jvm.PythonSQLUtils.explainString(self._jdf.queryExecution(), explain_mode)
)
Removing the print, you get the explain
as a string.
Upvotes: 22