Reputation: 1534
I wonder if Spark SQL support caching result for the query defined in WITH clause. The Spark SQL query is something like this:
with base_view as
(
select some_columns from some_table
WHERE
expensive_udf(some_column) = true
)
... multiple query join based on this view
While this query works with Spark SQL, I noticed that the UDF were applied to the same data set multiple times. In this use case, the UDF is very expensive. So I'd like to cache the query result of base_view so the subsequent queries would benefit from the cached result.
P.S. I know you can create and cache a table with the given query and then reference it in the subqueries. In this specific case, though, I can't create any tables or views.
Upvotes: 3
Views: 3900
Reputation: 783
Not sure if you are still interested in the solution, but the following is a workaround to accomplish the same:-
spark.sql("""
| create temp view my_view
| as
| WITH base_view as
| (
| select some_columns
| from some_table
| WHERE
| expensive_udf(some_column) = true
| )
| SELECT *
| from base_view
""");
spark.sql("""CACHE TABLE my_view""");
Now you can use the my_view
temp view to join to other tables as shown below-
spark.sql("""
| select mv.col1, t2.col2, t3.col3
| from my_view mv
| join tab2 t2
| on mv.col2 = t2.col2
| join tab3 t3
| on mv.col3 = t3.col3
""");
Remember to uncache
the view after using-
spark.sql("""UNCACHE TABLE my_view""");
Hope this helps.
Upvotes: 0
Reputation: 21
This can be done by excuting several sql query.
-- first cache sql
spark.sql("
CACHE TABLE base_view as
select some_columns
from some_table
WHERE
expensive_udf(some_column) = true")
-- then use
spark.sql("
... multiple query join based on this view
")
Upvotes: 0
Reputation: 894
The WITH clause allows you to give a name to a temporary result set so it ca be reused several times within a single query. I believe what he's asking for is a materialized view.
Upvotes: 1
Reputation: 18003
That is not possible. The WITH result cannot be persisted after execution or substituted into new Spark SQL invocation.
Upvotes: 2