PostgreSQL: Speed up multiple JOINs of one big table with many small tables

Question

I am trying to speed up my PostgreSQL reporting query. I simplyfied the data model for better understanding.

My reporting query uses the following tables:

1 big table with ~500.000 rows. This can easily grow to millions of rows.
~20 small tables with each ~20 rows. They just contain texts getting added to the big table.

The big table looks like this:

CREATE TABLE big_table (
    id character varying(100) NOT NULL,
    attr_1_id character varying(100) NOT NULL,
    attr_2_id character varying(100) NOT NULL,
    attr_3_id character varying(100) NOT NULL,
    ...
    CONSTRAINT big_table_pkey PRIMARY KEY (id)
);

And the small tables all look like this:

CREATE TABLE small_table_1 (
    id character varying(100) NOT NULL,
    language_id character(2) NOT NULL,
    text character varying(100),
    CONSTRAINT small_table_1_pkey PRIMARY KEY (id)
);

The query looks like this:

SELECT
    big_table.attr_1_id,
    small_table_1.text,

    big_table.attr_2_id,
    small_table_2.text,

    big_table.attr_3_id,
    small_table_3.text,

    ...
    
FROM big_table

LEFT JOIN small_table_1
ON small_table_1.id = big_table.attr_id_1
AND small_table_1.language_id = 'DE' 

LEFT JOIN small_table_2
ON small_table_2.id = big_table.attr_id_2
AND small_table_2.language_id = 'DE'

LEFT JOIN small_table_3
ON small_table_3.id = big_table.attr_id_3
AND small_table_3.language_id = 'DE'

...

With ~500.000 rows in the big table every LEFT JOIN already takes ~200ms. So with millions of rows and many small tables to join the execution time grows quickly.

In the query planner each LEFT JOIN looks like this:

Hash Left Join  (cost=31.94..13727.27 rows=432242 width=148) (actual time=4.398..581.194 rows=432242 loops=1)
    Hash Cond: ((small_table_1.id)::text = (big_table.attr_id_1)::text)

Here is the complete real query plan. I still had to remove the LEFT JOINs of 10 small tables to not hit the Stack Overflow max character limit of 30,000.

"QUERY PLAN"
"Hash Left Join  (cost=208.64..400076.38 rows=460222 width=3163) (actual time=7.508..7178.265 rows=493850 loops=1)"
"  Hash Cond: ((""*SELECT* 1"".employment_status_id)::text = (estx.id)::text)"
"  Buffers: shared hit=8290"
"  CTE q2"
"    ->  Seq Scan on cust_group  (cost=0.00..13.00 rows=300 width=16) (actual time=0.002..0.004 rows=1 loops=1)"
"          Buffers: shared hit=1"
"  InitPlan 2 (returns $1)"
"    ->  CTE Scan on q2  (cost=0.00..6.00 rows=300 width=16) (actual time=0.001..0.001 rows=1 loops=1)"
"  InitPlan 3 (returns $2)"
"    ->  CTE Scan on q2 q2_1  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.000 rows=1 loops=1)"
"  InitPlan 4 (returns $3)"
"    ->  CTE Scan on q2 q2_2  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.001 rows=1 loops=1)"
"  InitPlan 5 (returns $4)"
"    ->  CTE Scan on q2 q2_3  (cost=0.00..6.00 rows=300 width=16) (actual time=0.004..0.005 rows=1 loops=1)"
"          Buffers: shared hit=1"
"  InitPlan 6 (returns $5)"
"    ->  CTE Scan on q2 q2_4  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.001 rows=1 loops=1)"
"  InitPlan 7 (returns $6)"
"    ->  CTE Scan on q2 q2_5  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.000 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=158.57..338963.78 rows=460222 width=2513) (actual time=6.695..4724.369 rows=493850 loops=1)"
"        Hash Cond: (((NULL::character varying))::text = (citx.id)::text)"
"        Buffers: shared hit=8289"
"        ->  Hash Left Join  (cost=156.37..337727.90 rows=460222 width=2490) (actual time=5.361..4492.058 rows=493850 loops=1)"
"              Hash Cond: (""*SELECT* 1"".action_type_id = attx.id)"
"              Buffers: shared hit=8288"
"              ->  Hash Left Join  (cost=155.00..336492.84 rows=460222 width=2478) (actual time=3.754..4261.646 rows=493850 loops=1)"
"                    Hash Cond: ((""*SELECT* 1"".job_id)::text = (jbtx.id)::text)"
"                    Buffers: shared hit=8287"
"                    ->  Hash Left Join  (cost=128.25..335232.41 rows=460222 width=2457) (actual time=1.704..4048.729 rows=493850 loops=1)"
"                          Hash Cond: ((""*SELECT* 1"".employee_type_id)::text = (ettx.id)::text)"
"                          Buffers: shared hit=8264"
"                          ->  Nested Loop Left Join  (cost=126.96..333997.44 rows=460222 width=2239) (actual time=0.642..3788.566 rows=493850 loops=1)"
"                                Buffers: shared hit=8263"
"                                ->  Nested Loop Left Join  (cost=126.81..328236.49 rows=460222 width=2222) (actual time=0.622..3478.490 rows=493850 loops=1)"
"                                      Buffers: shared hit=8261"
"                                      ->  Hash Left Join  (cost=126.67..322475.55 rows=460222 width=2205) (actual time=0.612..3155.816 rows=493850 loops=1)"
"                                            Hash Cond: ((""*SELECT* 1"".org_unit_id)::text = (outx.id)::text)"
"                                            Buffers: shared hit=8259"
"                                            ->  Hash Left Join  (cost=98.98..321214.18 rows=460222 width=2185) (actual time=0.541..2969.331 rows=493850 loops=1)"
"                                                  Hash Cond: ((""*SELECT* 1"".cost_center_id)::text = (cctx.id)::text)"
"                                                  Buffers: shared hit=8236"
"                                                  ->  Hash Left Join  (cost=96.93..319978.45 rows=460222 width=2175) (actual time=0.519..2712.470 rows=493850 loops=1)"
"                                                        Hash Cond: (coat.country_id = (cutx.id)::bpchar)"
"                                                        Buffers: shared hit=8235"
"                                                        ->  Hash Left Join  (cost=83.59..313061.78 rows=460222 width=2163) (actual time=0.412..2404.622 rows=493850 loops=1)"
"                                                              Hash Cond: ((""*SELECT* 1"".company_id)::text = (cotx.id)::text)"
"                                                              Buffers: shared hit=8231"
"                                                              ->  Hash Left Join  (cost=71.45..311815.96 rows=460222 width=1945) (actual time=0.382..2131.789 rows=493850 loops=1)"
"                                                                    Hash Cond: ((NULL::bpchar) = (crtx_le.id)::bpchar)"
"                                                                    Buffers: shared hit=8230"
"                                                                    ->  Nested Loop Left Join  (cost=61.14..305017.38 rows=460222 width=1928) (actual time=0.298..1886.031 rows=493850 loops=1)"
"                                                                          Buffers: shared hit=8227"
"                                                                          ->  Hash Left Join  (cost=60.99..299256.43 rows=460222 width=1896) (actual time=0.282..1584.799 rows=493850 loops=1)"
"                                                                                Hash Cond: ((""*SELECT* 1"".employee_id)::text = (emat.id)::text)"
"                                                                                Buffers: shared hit=8225"
"                                                                                ->  Hash Left Join  (cost=47.75..298019.21 rows=460222 width=1871) (actual time=0.131..1287.387 rows=493850 loops=1)"
"                                                                                      Hash Cond: (""*SELECT* 1"".scenario_id = scen.id)"
"                                                                                      Buffers: shared hit=8219"
"                                                                                      ->  Hash Left Join  (cost=46.70..296784.48 rows=460222 width=1853) (actual time=0.122..1026.567 rows=493850 loops=1)"
"                                                                                            Hash Cond: (""*SELECT* 1"".action_id = aclg.action_id)"
"                                                                                            Buffers: shared hit=8218"
"                                                                                            ->  Hash Left Join  (cost=38.02..292886.93 rows=460222 width=1789) (actual time=0.046..883.004 rows=493850 loops=1)"
"                                                                                                  Hash Cond: ((""*SELECT* 1"".company_id)::text = (coat.id)::text)"
"                                                                                                  Buffers: shared hit=8212"
"                                                                                                  ->  Hash Left Join  (cost=21.05..291645.36 rows=460222 width=1773) (actual time=0.035..712.011 rows=493850 loops=1)"
"                                                                                                        Hash Cond: ((NULL::bpchar) = (conv_gr.from_currency_id)::bpchar)"
"                                                                                                        Buffers: shared hit=8211"
"                                                                                                        ->  Append  (cost=0.00..285204.21 rows=460222 width=1742) (actual time=0.013..465.417 rows=493850 loops=1)"
"                                                                                                              Buffers: shared hit=8209"
"                                                                                                              ->  Subquery Scan on ""*SELECT* 1""  (cost=0.00..2739.92 rows=68841 width=178) (actual time=0.012..55.052 rows=68841 loops=1)"
"                                                                                                                    Buffers: shared hit=1191"
"                                                                                                                    ->  Seq Scan on appl_plan_headcount  (cost=0.00..1879.41 rows=68841 width=150) (actual time=0.010..32.481 rows=68841 loops=1)"
"                                                                                                                          Buffers: shared hit=1191"
"                                                                                                              ->  Subquery Scan on ""*SELECT* 2""  (cost=3405.85..280163.18 rows=391381 width=151) (actual time=36.277..350.251 rows=425009 loops=1)"
"                                                                                                                    Buffers: shared hit=7018"
"                                                                                                                    ->  Gather  (cost=3405.85..274292.47 rows=391381 width=95) (actual time=36.275..231.496 rows=425009 loops=1)"
"                                                                                                                          Workers Planned: 2"
"                                                                                                                          Workers Launched: 2"
"                                                                                                                          Buffers: shared hit=7018"
"                                                                                                                          ->  Parallel Hash Join  (cost=2405.85..234154.37 rows=163075 width=95) (actual time=26.658..228.421 rows=141670 loops=3)"
"                                                                                                                                Hash Cond: ((co.scenario_id = hc.scenario_id) AND ((co.employee_id)::text = (hc.employee_id)::text) AND (co.cal_year_month_id = hc.cal_year_month_id) AND (co.action_id = hc.action_id))"
"                                                                                                                                Buffers: shared hit=7018"
"                                                                                                                                ->  Parallel Seq Scan on appl_plan_costs co  (cost=0.00..7464.01 rows=180101 width=45) (actual time=0.007..52.792 rows=144081 loops=3)"
"                                                                                                                                      Buffers: shared hit=5663"
"                                                                                                                                ->  Parallel Hash  (cost=1595.95..1595.95 rows=40495 width=62) (actual time=25.189..25.189 rows=22947 loops=3)"
"                                                                                                                                      Buckets: 131072  Batches: 1  Memory Usage: 8096kB"
"                                                                                                                                      Buffers: shared hit=1191"
"                                                                                                                                      ->  Parallel Seq Scan on appl_plan_headcount hc  (cost=0.00..1595.95 rows=40495 width=62) (actual time=0.009..10.810 rows=22947 loops=3)"
"                                                                                                                                            Buffers: shared hit=1191"
"                                                                                                        ->  Hash  (cost=21.00..21.00 rows=4 width=48) (actual time=0.016..0.017 rows=6 loops=1)"
"                                                                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                                              Buffers: shared hit=2"
"                                                                                                              ->  Seq Scan on cust_currency_conversion conv_gr  (cost=0.00..21.00 rows=4 width=48) (actual time=0.011..0.013 rows=6 loops=1)"
"                                                                                                                    Filter: ((to_currency_id)::text = ($4)::text)"
"                                                                                                                    Rows Removed by Filter: 6"
"                                                                                                                    Buffers: shared hit=2"
"                                                                                                  ->  Hash  (cost=13.10..13.10 rows=310 width=234) (actual time=0.006..0.006 rows=8 loops=1)"
"                                                                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                                        Buffers: shared hit=1"
"                                                                                                        ->  Seq Scan on appl_company_attr coat  (cost=0.00..13.10 rows=310 width=234) (actual time=0.003..0.004 rows=8 loops=1)"
"                                                                                                              Buffers: shared hit=1"
"                                                                                            ->  Hash  (cost=7.50..7.50 rows=94 width=72) (actual time=0.070..0.072 rows=91 loops=1)"
"                                                                                                  Buckets: 1024  Batches: 1  Memory Usage: 17kB"
"                                                                                                  Buffers: shared hit=6"
"                                                                                                  ->  Hash Left Join  (cost=3.18..7.50 rows=94 width=72) (actual time=0.016..0.053 rows=91 loops=1)"
"                                                                                                        Hash Cond: ((aclg.created_by)::text = (ausr.id)::text)"
"                                                                                                        Buffers: shared hit=6"
"                                                                                                        ->  Seq Scan on appl_plan_action_log aclg  (cost=0.00..3.94 rows=94 width=58) (actual time=0.003..0.013 rows=91 loops=1)"
"                                                                                                              Buffers: shared hit=3"
"                                                                                                        ->  Hash  (cost=3.08..3.08 rows=8 width=24) (actual time=0.008..0.009 rows=5 loops=1)"
"                                                                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                                              Buffers: shared hit=3"
"                                                                                                              ->  Seq Scan on auth_user ausr  (cost=0.00..3.08 rows=8 width=24) (actual time=0.003..0.007 rows=5 loops=1)"
"                                                                                                                    Buffers: shared hit=3"
"                                                                                      ->  Hash  (cost=1.02..1.02 rows=2 width=22) (actual time=0.005..0.005 rows=4 loops=1)"
"                                                                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                            Buffers: shared hit=1"
"                                                                                            ->  Seq Scan on appl_scenarios scen  (cost=0.00..1.02 rows=2 width=22) (actual time=0.003..0.003 rows=4 loops=1)"
"                                                                                                  Buffers: shared hit=1"
"                                                                                ->  Hash  (cost=9.22..9.22 rows=322 width=31) (actual time=0.146..0.146 rows=370 loops=1)"
"                                                                                      Buckets: 1024  Batches: 1  Memory Usage: 33kB"
"                                                                                      Buffers: shared hit=6"
"                                                                                      ->  Seq Scan on appl_employee_attr emat  (cost=0.00..9.22 rows=322 width=31) (actual time=0.005..0.057 rows=370 loops=1)"
"                                                                                            Buffers: shared hit=6"
"                                                                          ->  Materialize  (cost=0.15..8.18 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=493850)"
"                                                                                Buffers: shared hit=2"
"                                                                                ->  Index Scan using cust_currency_conversion_pkey on cust_currency_conversion conv_ta  (cost=0.15..8.17 rows=1 width=32) (actual time=0.012..0.013 rows=1 loops=1)"
"                                                                                      Index Cond: (((from_currency_id)::text = ($5)::text) AND ((to_currency_id)::text = 'USD'::text))"
"                                                                                      Buffers: shared hit=2"
"                                                                    ->  Hash  (cost=7.88..7.88 rows=195 width=21) (actual time=0.080..0.081 rows=195 loops=1)"
"                                                                          Buckets: 1024  Batches: 1  Memory Usage: 19kB"
"                                                                          Buffers: shared hit=3"
"                                                                          ->  Seq Scan on appl_currency_text crtx_le  (cost=0.00..7.88 rows=195 width=21) (actual time=0.007..0.051 rows=195 loops=1)"
"                                                                                Filter: (language_id = 'DE'::bpchar)"
"                                                                                Rows Removed by Filter: 195"
"                                                                                Buffers: shared hit=3"
"                                                              ->  Hash  (cost=12.12..12.12 rows=1 width=436) (actual time=0.007..0.007 rows=8 loops=1)"
"                                                                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                    Buffers: shared hit=1"
"                                                                    ->  Seq Scan on appl_company_text cotx  (cost=0.00..12.12 rows=1 width=436) (actual time=0.003..0.005 rows=8 loops=1)"
"                                                                          Filter: (language_id = 'DE'::bpchar)"
"                                                                          Rows Removed by Filter: 8"
"                                                                          Buffers: shared hit=1"
"                                                        ->  Hash  (cost=10.23..10.23 rows=249 width=16) (actual time=0.102..0.102 rows=249 loops=1)"
"                                                              Buckets: 1024  Batches: 1  Memory Usage: 20kB"
"                                                              Buffers: shared hit=4"
"                                                              ->  Seq Scan on appl_country_text cutx  (cost=0.00..10.23 rows=249 width=16) (actual time=0.006..0.064 rows=249 loops=1)"
"                                                                    Filter: (language_id = 'DE'::bpchar)"
"                                                                    Rows Removed by Filter: 249"
"                                                                    Buffers: shared hit=4"
"                                                  ->  Hash  (cost=1.70..1.70 rows=28 width=20) (actual time=0.015..0.016 rows=28 loops=1)"
"                                                        Buckets: 1024  Batches: 1  Memory Usage: 10kB"
"                                                        Buffers: shared hit=1"
"                                                        ->  Seq Scan on appl_cost_center_text cctx  (cost=0.00..1.70 rows=28 width=20) (actual time=0.003..0.010 rows=28 loops=1)"
"                                                              Filter: (language_id = 'DE'::bpchar)"
"                                                              Rows Removed by Filter: 28"
"                                                              Buffers: shared hit=1"
"                                            ->  Hash  (cost=26.12..26.12 rows=125 width=29) (actual time=0.065..0.066 rows=125 loops=1)"
"                                                  Buckets: 1024  Batches: 1  Memory Usage: 16kB"
"                                                  Buffers: shared hit=23"
"                                                  ->  Seq Scan on appl_org_unit_text outx  (cost=0.00..26.12 rows=125 width=29) (actual time=0.004..0.046 rows=125 loops=1)"
"                                                        Filter: (language_id = 'DE'::bpchar)"
"                                                        Rows Removed by Filter: 125"
"                                                        Buffers: shared hit=23"
"                                      ->  Materialize  (cost=0.15..8.17 rows=1 width=17) (actual time=0.000..0.000 rows=1 loops=493850)"
"                                            Buffers: shared hit=2"
"                                            ->  Index Scan using appl_currency_text_pkey on appl_currency_text crtx_gr  (cost=0.15..8.17 rows=1 width=17) (actual time=0.007..0.008 rows=1 loops=1)"
"                                                  Index Cond: (((id)::text = ($6)::text) AND (language_id = 'DE'::bpchar))"
"                                                  Buffers: shared hit=2"
"                                ->  Materialize  (cost=0.15..8.17 rows=1 width=17) (actual time=0.000..0.000 rows=1 loops=493850)"
"                                      Buffers: shared hit=2"
"                                      ->  Index Scan using appl_currency_text_pkey on appl_currency_text crtx_ta  (cost=0.15..8.17 rows=1 width=17) (actual time=0.007..0.010 rows=1 loops=1)"
"                                            Index Cond: (((id)::text = 'USD'::text) AND (language_id = 'DE'::bpchar))"
"                                            Buffers: shared hit=2"
"                          ->  Hash  (cost=1.27..1.27 rows=1 width=436) (actual time=0.503..0.504 rows=11 loops=1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                Buffers: shared hit=1"
"                                ->  Seq Scan on appl_employee_type_text ettx  (cost=0.00..1.27 rows=1 width=436) (actual time=0.007..0.010 rows=11 loops=1)"
"                                      Filter: (language_id = 'DE'::bpchar)"
"                                      Rows Removed by Filter: 11"
"                                      Buffers: shared hit=1"
"                    ->  Hash  (cost=25.50..25.50 rows=100 width=29) (actual time=0.706..0.707 rows=100 loops=1)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 14kB"
"                          Buffers: shared hit=23"
"                          ->  Seq Scan on appl_job_text jbtx  (cost=0.00..25.50 rows=100 width=29) (actual time=0.004..0.047 rows=100 loops=1)"
"                                Filter: (language_id = 'DE'::bpchar)"
"                                Rows Removed by Filter: 100"
"                                Buffers: shared hit=23"
"              ->  Hash  (cost=1.25..1.25 rows=10 width=16) (actual time=0.409..0.410 rows=10 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    Buffers: shared hit=1"
"                    ->  Seq Scan on appl_action_type_text attx  (cost=0.00..1.25 rows=10 width=16) (actual time=0.006..0.008 rows=10 loops=1)"
"                          Filter: (language_id = 'DE'::bpchar)"
"                          Rows Removed by Filter: 10"
"                          Buffers: shared hit=1"
"        ->  Hash  (cost=1.80..1.80 rows=32 width=26) (actual time=0.360..0.360 rows=32 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 10kB"
"              Buffers: shared hit=1"
"              ->  Seq Scan on appl_cost_item_text citx  (cost=0.00..1.80 rows=32 width=26) (actual time=0.008..0.015 rows=32 loops=1)"
"                    Filter: (language_id = 'DE'::bpchar)"
"                    Rows Removed by Filter: 32"
"                    Buffers: shared hit=1"
"  ->  Hash  (cost=1.05..1.05 rows=1 width=296) (actual time=0.418..0.419 rows=2 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        Buffers: shared hit=1"
"        ->  Seq Scan on appl_employment_status_text estx  (cost=0.00..1.05 rows=1 width=296) (actual time=0.007..0.008 rows=2 loops=1)"
"              Filter: (language_id = 'DE'::bpchar)"
"              Rows Removed by Filter: 2"
"              Buffers: shared hit=1"
"Planning Time: 2.079 ms"
"Execution Time: 7247.830 ms"

Almost 80 % of the time is spent on the LEFT JOINs.

More information:

PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
I configured enough work_mem so nothing happens on disc.
According to the query planner a Hash Left Join is used which should already be the quickest.
From my understanding using an index will not help because all rows from the big table are read and half of the rows from each small table are read because there are currently only two languages (DE, EN).
A poor try was to UNION all small tables first because they all have the same column structure. Then I did a LEFT JOIN of the result multiple times. But no improvement.

Is there any way to speed this up? Like parallel hash joins? How does that work? Or is setting other parameters an option?

PostgreSQL: Speed up multiple JOINs of one big table with many small tables

Answers (1)

Related Questions