Postgresql greatly underestimate returned rows

Question

I got an issue with Postgresql greatly underestimating number of rows returned by a join between CTE and a table. The xfactor is about 42 and I think that this leads Postgres to choose nested loop.
As you can see in EXPLAIN ANALYZE output, aggregating consume nearly 100% of the total time.
The most interesting thing is that I've tried to add another query level to perform the sum afterwards and it is decreasing time from 33s to 6s.
I think that nested-loop is joining table in such a bad way that aggregating the output is a pain for Postgres. Or maybe I'm missing something important ?

Tables (each one containing 50k rows) :

CREATE TABLE IF NOT EXISTS public.forme_iris
(
    code_iris character varying(20) COLLATE pg_catalog."default" NOT NULL,
    geometry geometry,
    CONSTRAINT forme_iris_pkey PRIMARY KEY (code_iris)
)

CREATE TABLE IF NOT EXISTS public.dl_rawdata_recensement
(
    iris character varying(200)
    pop_0_14_ans_en_2016_princ character varying(223),
    pop_15_29_ans_en_2016_princ character varying(224),
    pop_30_44_ans_en_2016_princ character varying(225),
    pop_45_59_ans_en_2016_princ character varying(226),
    pop_60_74_ans_en_2016_princ character varying(220),
    pop_75_ans_ou_plus_en_2016_princ character varying(228),
    CONSTRAINT dl_rawdata_recensement_pkey PRIMARY KEY (iris)
)

Covering Index :

CREATE INDEX dl_rawdata_recensement_cvg_idx1
    ON public.dl_rawdata_recensement USING btree
    (iris COLLATE pg_catalog."default" ASC NULLS LAST)
    INCLUDE(pop_0_14_ans_en_2016_princ, pop_15_29_ans_en_2016_princ, pop_30_44_ans_en_2016_princ, pop_45_59_ans_en_2016_princ, pop_60_74_ans_en_2016_princ, pop_75_ans_ou_plus_en_2016_princ, pop_15_ans_ou_plus_hommes_en_2016_cplt, pop_15_ans_ou_plus_femmes_en_2016_cplt, pop_0_14_ans_en_2016_princ)
    TABLESPACE pg_default;

Whole query:

WITH cte
AS (
    SELECT ST_SetSrid(ST_GeomFromGeoJson('{geometry}'), 4326) AS geom
)
SELECT SUM(pourcentage * (pop_0_14_ans_en_2016_princ + pop_15_29_ans_en_2016_princ + pop_30_44_ans_en_2016_princ + pop_45_59_ans_en_2016_princ + pop_60_74_ans_en_2016_princ + pop_75_ans_ou_plus_en_2016_princ)) total_population,
       SUM(pourcentage * pop_0_14_ans_en_2016_princ) total_0_14,
       SUM(pourcentage * pop_15_29_ans_en_2016_princ) total_15_29,
       SUM(pourcentage * pop_30_44_ans_en_2016_princ) total_30_44,
       SUM(pourcentage * pop_45_59_ans_en_2016_princ) total_45_59,
       SUM(pourcentage * pop_60_74_ans_en_2016_princ) total_60_74,
       SUM(pourcentage * pop_75_ans_ou_plus_en_2016_princ) total_75
FROM (SELECT (ST_Area(ST_Intersection(iris.geometry, cte.geom)) / ST_Area(iris.geometry)) AS pourcentage, 
             code_iris
      FROM forme_iris iris
      JOIN cte
      ON ST_Intersects(iris.geometry, cte.geom)) AS a
JOIN public.dl_rawdata_recensement population
ON population.iris = a.code_iris

Subquery concerned by the nested-loop:

SELECT (ST_Area(ST_Intersection(iris.geometry, cte.geom)) / ST_Area(iris.geometry)) AS pourcentage, 
       code_iris
FROM forme_iris iris
JOIN cte
ON ST_Intersects(iris.geometry, cte.geom)

I've vacuumed + analyzed both table before performing the request. Is Postgres still missing some statistics ?

PostgreSQL version: PostgreSQL 11.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit

Postgis version: 2.5 USE_GEOS=1 USE_PROJ=1 USE_STATS=1

EXPLAIN ANALYZE: https://explain.depesz.com/s/sjON

Thanks !

Postgresql greatly underestimate returned rows

Answers (1)

Related Questions