Jan
Jan

Reputation: 180

PostgreSQL: Speed up multiple JOINs of one big table with many small tables

I am trying to speed up my PostgreSQL reporting query. I simplyfied the data model for better understanding.

My reporting query uses the following tables:

The big table looks like this:

CREATE TABLE big_table (
    id character varying(100) NOT NULL,
    attr_1_id character varying(100) NOT NULL,
    attr_2_id character varying(100) NOT NULL,
    attr_3_id character varying(100) NOT NULL,
    ...
    CONSTRAINT big_table_pkey PRIMARY KEY (id)
);

And the small tables all look like this:

CREATE TABLE small_table_1 (
    id character varying(100) NOT NULL,
    language_id character(2) NOT NULL,
    text character varying(100),
    CONSTRAINT small_table_1_pkey PRIMARY KEY (id)
);

The query looks like this:

SELECT
    big_table.attr_1_id,
    small_table_1.text,

    big_table.attr_2_id,
    small_table_2.text,

    big_table.attr_3_id,
    small_table_3.text,

    ...
    
FROM big_table

LEFT JOIN small_table_1
ON small_table_1.id = big_table.attr_id_1
AND small_table_1.language_id = 'DE' 

LEFT JOIN small_table_2
ON small_table_2.id = big_table.attr_id_2
AND small_table_2.language_id = 'DE'

LEFT JOIN small_table_3
ON small_table_3.id = big_table.attr_id_3
AND small_table_3.language_id = 'DE'

...

With ~500.000 rows in the big table every LEFT JOIN already takes ~200ms. So with millions of rows and many small tables to join the execution time grows quickly.

In the query planner each LEFT JOIN looks like this:

Hash Left Join  (cost=31.94..13727.27 rows=432242 width=148) (actual time=4.398..581.194 rows=432242 loops=1)
    Hash Cond: ((small_table_1.id)::text = (big_table.attr_id_1)::text)

Here is the complete real query plan. I still had to remove the LEFT JOINs of 10 small tables to not hit the Stack Overflow max character limit of 30,000.

"QUERY PLAN"
"Hash Left Join  (cost=208.64..400076.38 rows=460222 width=3163) (actual time=7.508..7178.265 rows=493850 loops=1)"
"  Hash Cond: ((""*SELECT* 1"".employment_status_id)::text = (estx.id)::text)"
"  Buffers: shared hit=8290"
"  CTE q2"
"    ->  Seq Scan on cust_group  (cost=0.00..13.00 rows=300 width=16) (actual time=0.002..0.004 rows=1 loops=1)"
"          Buffers: shared hit=1"
"  InitPlan 2 (returns $1)"
"    ->  CTE Scan on q2  (cost=0.00..6.00 rows=300 width=16) (actual time=0.001..0.001 rows=1 loops=1)"
"  InitPlan 3 (returns $2)"
"    ->  CTE Scan on q2 q2_1  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.000 rows=1 loops=1)"
"  InitPlan 4 (returns $3)"
"    ->  CTE Scan on q2 q2_2  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.001 rows=1 loops=1)"
"  InitPlan 5 (returns $4)"
"    ->  CTE Scan on q2 q2_3  (cost=0.00..6.00 rows=300 width=16) (actual time=0.004..0.005 rows=1 loops=1)"
"          Buffers: shared hit=1"
"  InitPlan 6 (returns $5)"
"    ->  CTE Scan on q2 q2_4  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.001 rows=1 loops=1)"
"  InitPlan 7 (returns $6)"
"    ->  CTE Scan on q2 q2_5  (cost=0.00..6.00 rows=300 width=16) (actual time=0.000..0.000 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=158.57..338963.78 rows=460222 width=2513) (actual time=6.695..4724.369 rows=493850 loops=1)"
"        Hash Cond: (((NULL::character varying))::text = (citx.id)::text)"
"        Buffers: shared hit=8289"
"        ->  Hash Left Join  (cost=156.37..337727.90 rows=460222 width=2490) (actual time=5.361..4492.058 rows=493850 loops=1)"
"              Hash Cond: (""*SELECT* 1"".action_type_id = attx.id)"
"              Buffers: shared hit=8288"
"              ->  Hash Left Join  (cost=155.00..336492.84 rows=460222 width=2478) (actual time=3.754..4261.646 rows=493850 loops=1)"
"                    Hash Cond: ((""*SELECT* 1"".job_id)::text = (jbtx.id)::text)"
"                    Buffers: shared hit=8287"
"                    ->  Hash Left Join  (cost=128.25..335232.41 rows=460222 width=2457) (actual time=1.704..4048.729 rows=493850 loops=1)"
"                          Hash Cond: ((""*SELECT* 1"".employee_type_id)::text = (ettx.id)::text)"
"                          Buffers: shared hit=8264"
"                          ->  Nested Loop Left Join  (cost=126.96..333997.44 rows=460222 width=2239) (actual time=0.642..3788.566 rows=493850 loops=1)"
"                                Buffers: shared hit=8263"
"                                ->  Nested Loop Left Join  (cost=126.81..328236.49 rows=460222 width=2222) (actual time=0.622..3478.490 rows=493850 loops=1)"
"                                      Buffers: shared hit=8261"
"                                      ->  Hash Left Join  (cost=126.67..322475.55 rows=460222 width=2205) (actual time=0.612..3155.816 rows=493850 loops=1)"
"                                            Hash Cond: ((""*SELECT* 1"".org_unit_id)::text = (outx.id)::text)"
"                                            Buffers: shared hit=8259"
"                                            ->  Hash Left Join  (cost=98.98..321214.18 rows=460222 width=2185) (actual time=0.541..2969.331 rows=493850 loops=1)"
"                                                  Hash Cond: ((""*SELECT* 1"".cost_center_id)::text = (cctx.id)::text)"
"                                                  Buffers: shared hit=8236"
"                                                  ->  Hash Left Join  (cost=96.93..319978.45 rows=460222 width=2175) (actual time=0.519..2712.470 rows=493850 loops=1)"
"                                                        Hash Cond: (coat.country_id = (cutx.id)::bpchar)"
"                                                        Buffers: shared hit=8235"
"                                                        ->  Hash Left Join  (cost=83.59..313061.78 rows=460222 width=2163) (actual time=0.412..2404.622 rows=493850 loops=1)"
"                                                              Hash Cond: ((""*SELECT* 1"".company_id)::text = (cotx.id)::text)"
"                                                              Buffers: shared hit=8231"
"                                                              ->  Hash Left Join  (cost=71.45..311815.96 rows=460222 width=1945) (actual time=0.382..2131.789 rows=493850 loops=1)"
"                                                                    Hash Cond: ((NULL::bpchar) = (crtx_le.id)::bpchar)"
"                                                                    Buffers: shared hit=8230"
"                                                                    ->  Nested Loop Left Join  (cost=61.14..305017.38 rows=460222 width=1928) (actual time=0.298..1886.031 rows=493850 loops=1)"
"                                                                          Buffers: shared hit=8227"
"                                                                          ->  Hash Left Join  (cost=60.99..299256.43 rows=460222 width=1896) (actual time=0.282..1584.799 rows=493850 loops=1)"
"                                                                                Hash Cond: ((""*SELECT* 1"".employee_id)::text = (emat.id)::text)"
"                                                                                Buffers: shared hit=8225"
"                                                                                ->  Hash Left Join  (cost=47.75..298019.21 rows=460222 width=1871) (actual time=0.131..1287.387 rows=493850 loops=1)"
"                                                                                      Hash Cond: (""*SELECT* 1"".scenario_id = scen.id)"
"                                                                                      Buffers: shared hit=8219"
"                                                                                      ->  Hash Left Join  (cost=46.70..296784.48 rows=460222 width=1853) (actual time=0.122..1026.567 rows=493850 loops=1)"
"                                                                                            Hash Cond: (""*SELECT* 1"".action_id = aclg.action_id)"
"                                                                                            Buffers: shared hit=8218"
"                                                                                            ->  Hash Left Join  (cost=38.02..292886.93 rows=460222 width=1789) (actual time=0.046..883.004 rows=493850 loops=1)"
"                                                                                                  Hash Cond: ((""*SELECT* 1"".company_id)::text = (coat.id)::text)"
"                                                                                                  Buffers: shared hit=8212"
"                                                                                                  ->  Hash Left Join  (cost=21.05..291645.36 rows=460222 width=1773) (actual time=0.035..712.011 rows=493850 loops=1)"
"                                                                                                        Hash Cond: ((NULL::bpchar) = (conv_gr.from_currency_id)::bpchar)"
"                                                                                                        Buffers: shared hit=8211"
"                                                                                                        ->  Append  (cost=0.00..285204.21 rows=460222 width=1742) (actual time=0.013..465.417 rows=493850 loops=1)"
"                                                                                                              Buffers: shared hit=8209"
"                                                                                                              ->  Subquery Scan on ""*SELECT* 1""  (cost=0.00..2739.92 rows=68841 width=178) (actual time=0.012..55.052 rows=68841 loops=1)"
"                                                                                                                    Buffers: shared hit=1191"
"                                                                                                                    ->  Seq Scan on appl_plan_headcount  (cost=0.00..1879.41 rows=68841 width=150) (actual time=0.010..32.481 rows=68841 loops=1)"
"                                                                                                                          Buffers: shared hit=1191"
"                                                                                                              ->  Subquery Scan on ""*SELECT* 2""  (cost=3405.85..280163.18 rows=391381 width=151) (actual time=36.277..350.251 rows=425009 loops=1)"
"                                                                                                                    Buffers: shared hit=7018"
"                                                                                                                    ->  Gather  (cost=3405.85..274292.47 rows=391381 width=95) (actual time=36.275..231.496 rows=425009 loops=1)"
"                                                                                                                          Workers Planned: 2"
"                                                                                                                          Workers Launched: 2"
"                                                                                                                          Buffers: shared hit=7018"
"                                                                                                                          ->  Parallel Hash Join  (cost=2405.85..234154.37 rows=163075 width=95) (actual time=26.658..228.421 rows=141670 loops=3)"
"                                                                                                                                Hash Cond: ((co.scenario_id = hc.scenario_id) AND ((co.employee_id)::text = (hc.employee_id)::text) AND (co.cal_year_month_id = hc.cal_year_month_id) AND (co.action_id = hc.action_id))"
"                                                                                                                                Buffers: shared hit=7018"
"                                                                                                                                ->  Parallel Seq Scan on appl_plan_costs co  (cost=0.00..7464.01 rows=180101 width=45) (actual time=0.007..52.792 rows=144081 loops=3)"
"                                                                                                                                      Buffers: shared hit=5663"
"                                                                                                                                ->  Parallel Hash  (cost=1595.95..1595.95 rows=40495 width=62) (actual time=25.189..25.189 rows=22947 loops=3)"
"                                                                                                                                      Buckets: 131072  Batches: 1  Memory Usage: 8096kB"
"                                                                                                                                      Buffers: shared hit=1191"
"                                                                                                                                      ->  Parallel Seq Scan on appl_plan_headcount hc  (cost=0.00..1595.95 rows=40495 width=62) (actual time=0.009..10.810 rows=22947 loops=3)"
"                                                                                                                                            Buffers: shared hit=1191"
"                                                                                                        ->  Hash  (cost=21.00..21.00 rows=4 width=48) (actual time=0.016..0.017 rows=6 loops=1)"
"                                                                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                                              Buffers: shared hit=2"
"                                                                                                              ->  Seq Scan on cust_currency_conversion conv_gr  (cost=0.00..21.00 rows=4 width=48) (actual time=0.011..0.013 rows=6 loops=1)"
"                                                                                                                    Filter: ((to_currency_id)::text = ($4)::text)"
"                                                                                                                    Rows Removed by Filter: 6"
"                                                                                                                    Buffers: shared hit=2"
"                                                                                                  ->  Hash  (cost=13.10..13.10 rows=310 width=234) (actual time=0.006..0.006 rows=8 loops=1)"
"                                                                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                                        Buffers: shared hit=1"
"                                                                                                        ->  Seq Scan on appl_company_attr coat  (cost=0.00..13.10 rows=310 width=234) (actual time=0.003..0.004 rows=8 loops=1)"
"                                                                                                              Buffers: shared hit=1"
"                                                                                            ->  Hash  (cost=7.50..7.50 rows=94 width=72) (actual time=0.070..0.072 rows=91 loops=1)"
"                                                                                                  Buckets: 1024  Batches: 1  Memory Usage: 17kB"
"                                                                                                  Buffers: shared hit=6"
"                                                                                                  ->  Hash Left Join  (cost=3.18..7.50 rows=94 width=72) (actual time=0.016..0.053 rows=91 loops=1)"
"                                                                                                        Hash Cond: ((aclg.created_by)::text = (ausr.id)::text)"
"                                                                                                        Buffers: shared hit=6"
"                                                                                                        ->  Seq Scan on appl_plan_action_log aclg  (cost=0.00..3.94 rows=94 width=58) (actual time=0.003..0.013 rows=91 loops=1)"
"                                                                                                              Buffers: shared hit=3"
"                                                                                                        ->  Hash  (cost=3.08..3.08 rows=8 width=24) (actual time=0.008..0.009 rows=5 loops=1)"
"                                                                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                                              Buffers: shared hit=3"
"                                                                                                              ->  Seq Scan on auth_user ausr  (cost=0.00..3.08 rows=8 width=24) (actual time=0.003..0.007 rows=5 loops=1)"
"                                                                                                                    Buffers: shared hit=3"
"                                                                                      ->  Hash  (cost=1.02..1.02 rows=2 width=22) (actual time=0.005..0.005 rows=4 loops=1)"
"                                                                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                                            Buffers: shared hit=1"
"                                                                                            ->  Seq Scan on appl_scenarios scen  (cost=0.00..1.02 rows=2 width=22) (actual time=0.003..0.003 rows=4 loops=1)"
"                                                                                                  Buffers: shared hit=1"
"                                                                                ->  Hash  (cost=9.22..9.22 rows=322 width=31) (actual time=0.146..0.146 rows=370 loops=1)"
"                                                                                      Buckets: 1024  Batches: 1  Memory Usage: 33kB"
"                                                                                      Buffers: shared hit=6"
"                                                                                      ->  Seq Scan on appl_employee_attr emat  (cost=0.00..9.22 rows=322 width=31) (actual time=0.005..0.057 rows=370 loops=1)"
"                                                                                            Buffers: shared hit=6"
"                                                                          ->  Materialize  (cost=0.15..8.18 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=493850)"
"                                                                                Buffers: shared hit=2"
"                                                                                ->  Index Scan using cust_currency_conversion_pkey on cust_currency_conversion conv_ta  (cost=0.15..8.17 rows=1 width=32) (actual time=0.012..0.013 rows=1 loops=1)"
"                                                                                      Index Cond: (((from_currency_id)::text = ($5)::text) AND ((to_currency_id)::text = 'USD'::text))"
"                                                                                      Buffers: shared hit=2"
"                                                                    ->  Hash  (cost=7.88..7.88 rows=195 width=21) (actual time=0.080..0.081 rows=195 loops=1)"
"                                                                          Buckets: 1024  Batches: 1  Memory Usage: 19kB"
"                                                                          Buffers: shared hit=3"
"                                                                          ->  Seq Scan on appl_currency_text crtx_le  (cost=0.00..7.88 rows=195 width=21) (actual time=0.007..0.051 rows=195 loops=1)"
"                                                                                Filter: (language_id = 'DE'::bpchar)"
"                                                                                Rows Removed by Filter: 195"
"                                                                                Buffers: shared hit=3"
"                                                              ->  Hash  (cost=12.12..12.12 rows=1 width=436) (actual time=0.007..0.007 rows=8 loops=1)"
"                                                                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                                    Buffers: shared hit=1"
"                                                                    ->  Seq Scan on appl_company_text cotx  (cost=0.00..12.12 rows=1 width=436) (actual time=0.003..0.005 rows=8 loops=1)"
"                                                                          Filter: (language_id = 'DE'::bpchar)"
"                                                                          Rows Removed by Filter: 8"
"                                                                          Buffers: shared hit=1"
"                                                        ->  Hash  (cost=10.23..10.23 rows=249 width=16) (actual time=0.102..0.102 rows=249 loops=1)"
"                                                              Buckets: 1024  Batches: 1  Memory Usage: 20kB"
"                                                              Buffers: shared hit=4"
"                                                              ->  Seq Scan on appl_country_text cutx  (cost=0.00..10.23 rows=249 width=16) (actual time=0.006..0.064 rows=249 loops=1)"
"                                                                    Filter: (language_id = 'DE'::bpchar)"
"                                                                    Rows Removed by Filter: 249"
"                                                                    Buffers: shared hit=4"
"                                                  ->  Hash  (cost=1.70..1.70 rows=28 width=20) (actual time=0.015..0.016 rows=28 loops=1)"
"                                                        Buckets: 1024  Batches: 1  Memory Usage: 10kB"
"                                                        Buffers: shared hit=1"
"                                                        ->  Seq Scan on appl_cost_center_text cctx  (cost=0.00..1.70 rows=28 width=20) (actual time=0.003..0.010 rows=28 loops=1)"
"                                                              Filter: (language_id = 'DE'::bpchar)"
"                                                              Rows Removed by Filter: 28"
"                                                              Buffers: shared hit=1"
"                                            ->  Hash  (cost=26.12..26.12 rows=125 width=29) (actual time=0.065..0.066 rows=125 loops=1)"
"                                                  Buckets: 1024  Batches: 1  Memory Usage: 16kB"
"                                                  Buffers: shared hit=23"
"                                                  ->  Seq Scan on appl_org_unit_text outx  (cost=0.00..26.12 rows=125 width=29) (actual time=0.004..0.046 rows=125 loops=1)"
"                                                        Filter: (language_id = 'DE'::bpchar)"
"                                                        Rows Removed by Filter: 125"
"                                                        Buffers: shared hit=23"
"                                      ->  Materialize  (cost=0.15..8.17 rows=1 width=17) (actual time=0.000..0.000 rows=1 loops=493850)"
"                                            Buffers: shared hit=2"
"                                            ->  Index Scan using appl_currency_text_pkey on appl_currency_text crtx_gr  (cost=0.15..8.17 rows=1 width=17) (actual time=0.007..0.008 rows=1 loops=1)"
"                                                  Index Cond: (((id)::text = ($6)::text) AND (language_id = 'DE'::bpchar))"
"                                                  Buffers: shared hit=2"
"                                ->  Materialize  (cost=0.15..8.17 rows=1 width=17) (actual time=0.000..0.000 rows=1 loops=493850)"
"                                      Buffers: shared hit=2"
"                                      ->  Index Scan using appl_currency_text_pkey on appl_currency_text crtx_ta  (cost=0.15..8.17 rows=1 width=17) (actual time=0.007..0.010 rows=1 loops=1)"
"                                            Index Cond: (((id)::text = 'USD'::text) AND (language_id = 'DE'::bpchar))"
"                                            Buffers: shared hit=2"
"                          ->  Hash  (cost=1.27..1.27 rows=1 width=436) (actual time=0.503..0.504 rows=11 loops=1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                Buffers: shared hit=1"
"                                ->  Seq Scan on appl_employee_type_text ettx  (cost=0.00..1.27 rows=1 width=436) (actual time=0.007..0.010 rows=11 loops=1)"
"                                      Filter: (language_id = 'DE'::bpchar)"
"                                      Rows Removed by Filter: 11"
"                                      Buffers: shared hit=1"
"                    ->  Hash  (cost=25.50..25.50 rows=100 width=29) (actual time=0.706..0.707 rows=100 loops=1)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 14kB"
"                          Buffers: shared hit=23"
"                          ->  Seq Scan on appl_job_text jbtx  (cost=0.00..25.50 rows=100 width=29) (actual time=0.004..0.047 rows=100 loops=1)"
"                                Filter: (language_id = 'DE'::bpchar)"
"                                Rows Removed by Filter: 100"
"                                Buffers: shared hit=23"
"              ->  Hash  (cost=1.25..1.25 rows=10 width=16) (actual time=0.409..0.410 rows=10 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    Buffers: shared hit=1"
"                    ->  Seq Scan on appl_action_type_text attx  (cost=0.00..1.25 rows=10 width=16) (actual time=0.006..0.008 rows=10 loops=1)"
"                          Filter: (language_id = 'DE'::bpchar)"
"                          Rows Removed by Filter: 10"
"                          Buffers: shared hit=1"
"        ->  Hash  (cost=1.80..1.80 rows=32 width=26) (actual time=0.360..0.360 rows=32 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 10kB"
"              Buffers: shared hit=1"
"              ->  Seq Scan on appl_cost_item_text citx  (cost=0.00..1.80 rows=32 width=26) (actual time=0.008..0.015 rows=32 loops=1)"
"                    Filter: (language_id = 'DE'::bpchar)"
"                    Rows Removed by Filter: 32"
"                    Buffers: shared hit=1"
"  ->  Hash  (cost=1.05..1.05 rows=1 width=296) (actual time=0.418..0.419 rows=2 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        Buffers: shared hit=1"
"        ->  Seq Scan on appl_employment_status_text estx  (cost=0.00..1.05 rows=1 width=296) (actual time=0.007..0.008 rows=2 loops=1)"
"              Filter: (language_id = 'DE'::bpchar)"
"              Rows Removed by Filter: 2"
"              Buffers: shared hit=1"
"Planning Time: 2.079 ms"
"Execution Time: 7247.830 ms"

Almost 80 % of the time is spent on the LEFT JOINs.

Statistics per Node Type

More information:

Is there any way to speed this up? Like parallel hash joins? How does that work? Or is setting other parameters an option?

Upvotes: 2

Views: 6023

Answers (1)

Ramin Faracov
Ramin Faracov

Reputation: 3303

For best performance you must do these options:

  1. Create an index for fields which using in where clause
  2. Create an index for fields used in the join condition. For example: small_table_1.id = big_table.attr_id_1. Here field id may be a primary key, but you must create an index for field attr_id_1
  3. From my understanding using an index will not help because all rows are read. - No, indexes are very important, you should definitely use indexes. Indexes increase performance on the process of the joining table and filtering data.

ATTENTION!!! ATTENTION!!! I accidentally saw one bad code on your query plan, this is very bad for the performance.

Hash Left Join  (cost=31.94..13727.27 rows=432242 width=148) (actual time=4.398..581.194 rows=432242 loops=1)
    Hash Cond: ((small_table_1.id)::text = (big_table.attr_id_1)::text)

The database automatically converted id fields to text type, why? This is will very very decrease your performance. May be small_table_1.id and big_table.attr_id_1 types are difference. Please control all table's id field types. These fields must be the same type.

Upvotes: 2

Related Questions