Yeswanth
Yeswanth

Reputation: 11

Postgresql - not using parallelism

I am executing a select query using full outer join across 2 tables which are in 2 different databases. I'm using Postgresql 9.6.

The query is not going with parallelism even if we set the below parameters:

work_mem=256MB,
max_worker_process=40,
force_parallel_mode=on,
max_parallel_workers_per_gather=4,
parallel_tuple_cost=0.1,
parallel_setup_cost=1000,
min_parallel_relation_size=8MB

This is the query:

SELECT mea.ocs_cdr_type,
       mea.ocs_time_stamp,
       mea.sum_ocs_call_cost,
       ctr.ctr_name
FROM mea_req_54 mea
   FULL OUTER JOIN country ctr ON mea.ocs_imei  = ctr.ctr_name;

This is the definition of mea_req_54:

                      Table "public.mea_req_54"
           Column           |            Type             | Modifiers
----------------------------+-----------------------------+-----------
 mer_id                     | numeric(19,0)               | not null
 mer_from_dttm              | timestamp without time zone | not null
 mer_to_dttm                | timestamp without time zone | not null
 fng_id                     | numeric(19,0)               |
 ocs_imsi_number_norm       | character varying(255)      |
 ocs_account_number         | character varying(255)      |
 ocs_charging_id            | character varying(255)      |
 ocs_cdr_type               | character varying(255)      |
 ocs_bit_description        | character varying(255)      |
 ocs_time_stamp_raw         | timestamp without time zone |
 ocs_time_stamp             | timestamp without time zone |
 ocs_duration               | numeric(10,0)               |
 ocs_duration_str           | character varying(255)      |
 ocs_upload_volume          | numeric(19,0)               |
 ocs_download_volume        | numeric(19,0)               |
 sum_ocs_total_volume       | numeric(19,0)               |
 sum_ocs_call_cost          | numeric(19,0)               |
 ocs_plmn_identifier        | character varying(255)      |
 ocs_imei                   | character varying(255)      |
 ocs_user_loc_info          | character varying(255)      |
 ocs_bp_id                  | character varying(255)      |
 ocs_ref_spec_from_contract | character varying(255)      |
 ocs_subapp_in_contract_acc | character varying(255)      |
 ocs_baseline_date_bill     | timestamp without time zone |
 ocs_target_date_bill       | timestamp without time zone |
 ocs_date_of_origin_bill    | timestamp without time zone |
 ctr_id                     | numeric(10,0)               |
 ctr_iso_cd                 | character varying(255)      |
 ctr_name                   | character varying(255)      |
 dblink_run                 | numeric(10,0)               |
Indexes:
    "mea_req_54_pk" UNIQUE, btree (mer_id)

This is the definition of country:

                  Table "public.country"
       Column        |          Type          | Modifiers
---------------------+------------------------+-----------
 ctr_id              | numeric(10,0)          | not null
 ctr_iso_cd          | character varying(255) | not null
 ctr_name            | character varying(255) | not null
 system_generated_fl | character(1)           |
 ctr_delete_fl       | character(1)           | not null
 ctr_dial_code       | character varying(255) | not null
 ctr_version_id      | numeric(10,0)          | not null
 ptn_id              | numeric(10,0)          | not null
Indexes:
    "country_ak" UNIQUE, btree (ctr_name)
    "country_pk" UNIQUE, btree (ctr_id)
    "country_ss1" UNIQUE, btree (ctr_iso_cd)

This is the execution plan:

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=482.50..14564568.50 rows=300000000 width=29) (actual time=8.810..305863.949 rows=300015000 loops=1)
   Hash Cond: ((mea.ocs_imei)::text = (ctr.ctr_name)::text)
   ->  Seq Scan on mea_req_54 mea  (cost=0.00..10439086.00 rows=300000000 width=19) (actual time=0.005..131927.791 rows=300000000 loops=1)
   ->  Hash  (cost=295.00..295.00 rows=15000 width=13) (actual time=8.784..8.784 rows=15000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 791kB
         ->  Seq Scan on country ctr  (cost=0.00..295.00 rows=15000 width=13) (actual time=0.008..4.138 rows=15000 loops=1)
 Planning time: 0.085 ms
 Execution time: 355065.791 ms
(8 rows)

Upvotes: 1

Views: 1342

Answers (2)

Yeswanth
Yeswanth

Reputation: 11

Post reducing the parallel_tuple_cost and parallel_setup_cost parameters from the default values, the query ran with parallelism.

But I wanted to know about these parameters what exactly they are ?

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 246698

The documentation is silent about that, but in backend/optimizer/path/joinpath.c, function hash_inner_and_outer, I find the following enlightening comment:

/*
 * If the joinrel is parallel-safe, we may be able to consider a
 * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
 * because the outer path will be partial, and therefore we won't be
 * able to properly guarantee uniqueness.  Similarly, we can't handle
 * JOIN_FULL and JOIN_RIGHT, because they can produce false null
 * extended rows.  Also, the resulting path must not be parameterized.
 */

This makes sense – a parallel worker that scans part of mea_req_54 has no way to know if there is a row in country that does not match any of the rows in mea_req_54.

Now nested loop joins cannot be used for full outer joins, so all that remains is a parallel merge join.

I can't say if a merge join is an option here, but you may try and create an index on mea_req_54(ocs_imei) and see if that helps the optimizer choose a parallel plan.

Otherwise, you are probably out of luck.

Upvotes: 1

Related Questions