Zeruno
Zeruno

Reputation: 1639

General parallelism with PostgreSQL CTEs

I'm working with some large data and getting parallel plans in my queries is necessary. I also quite like to use CTEs to express my queries, but following PostgreSQL's documentation, I'm not so sure whether CTEs pose a serious limitation on parallelism or not.

Here, CTEs and temporary tables are marked as 'parallel restricted', where 'parallel restricted' is defined as

A parallel restricted operation is one which cannot be performed in a parallel worker, but which can be performed in the leader while parallel query is in use.

Here, the description on parallel limitations as far as CTEs are considered is a bit different:

If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated.

In my case, I don't have any data-modifying operations.

To what degree will CTEs limit the quality of my parallel plan, if at all?

To be fair, I've had some difficulty understanding the implications of the first definition. Since, CTEs can be materialized as temporary tables, then I'm sure that this impact is even more relevant. And the second definition suggests that CTE parallelism limitations are only related to data-modifying operations.

Upvotes: 2

Views: 1881

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247445

Using a CTE is fine and won't hamper parallel execution in most cases.

The restriction is that the CTE lives in the private process state of the parallel leader process, so parallel workers cannot scan it. Other than that, PostgreSQL will happily generate a parallel plan.

So things to avoid might be queries like

WITH a AS (SELECT ...)
SELECT ... FROM a JOIN b ...

if you would like the join to be parallelized. The query inside the CTE can be parallelized (I think), and scanning b can be parallelized as well.

For optimal parallelization, you could try to rewrite the query as

SELECT ...
FROM (SELECT ...) AS a
   JOIN b ...

Upvotes: 3

Related Questions