Reputation: 11
I am executing a select query using full outer join across 2 tables which are in 2 different databases. I'm using Postgresql 9.6.
The query is not going with parallelism even if we set the below parameters:
work_mem=256MB,
max_worker_process=40,
force_parallel_mode=on,
max_parallel_workers_per_gather=4,
parallel_tuple_cost=0.1,
parallel_setup_cost=1000,
min_parallel_relation_size=8MB
This is the query:
SELECT mea.ocs_cdr_type,
mea.ocs_time_stamp,
mea.sum_ocs_call_cost,
ctr.ctr_name
FROM mea_req_54 mea
FULL OUTER JOIN country ctr ON mea.ocs_imei = ctr.ctr_name;
This is the definition of mea_req_54
:
Table "public.mea_req_54"
Column | Type | Modifiers
----------------------------+-----------------------------+-----------
mer_id | numeric(19,0) | not null
mer_from_dttm | timestamp without time zone | not null
mer_to_dttm | timestamp without time zone | not null
fng_id | numeric(19,0) |
ocs_imsi_number_norm | character varying(255) |
ocs_account_number | character varying(255) |
ocs_charging_id | character varying(255) |
ocs_cdr_type | character varying(255) |
ocs_bit_description | character varying(255) |
ocs_time_stamp_raw | timestamp without time zone |
ocs_time_stamp | timestamp without time zone |
ocs_duration | numeric(10,0) |
ocs_duration_str | character varying(255) |
ocs_upload_volume | numeric(19,0) |
ocs_download_volume | numeric(19,0) |
sum_ocs_total_volume | numeric(19,0) |
sum_ocs_call_cost | numeric(19,0) |
ocs_plmn_identifier | character varying(255) |
ocs_imei | character varying(255) |
ocs_user_loc_info | character varying(255) |
ocs_bp_id | character varying(255) |
ocs_ref_spec_from_contract | character varying(255) |
ocs_subapp_in_contract_acc | character varying(255) |
ocs_baseline_date_bill | timestamp without time zone |
ocs_target_date_bill | timestamp without time zone |
ocs_date_of_origin_bill | timestamp without time zone |
ctr_id | numeric(10,0) |
ctr_iso_cd | character varying(255) |
ctr_name | character varying(255) |
dblink_run | numeric(10,0) |
Indexes:
"mea_req_54_pk" UNIQUE, btree (mer_id)
This is the definition of country
:
Table "public.country"
Column | Type | Modifiers
---------------------+------------------------+-----------
ctr_id | numeric(10,0) | not null
ctr_iso_cd | character varying(255) | not null
ctr_name | character varying(255) | not null
system_generated_fl | character(1) |
ctr_delete_fl | character(1) | not null
ctr_dial_code | character varying(255) | not null
ctr_version_id | numeric(10,0) | not null
ptn_id | numeric(10,0) | not null
Indexes:
"country_ak" UNIQUE, btree (ctr_name)
"country_pk" UNIQUE, btree (ctr_id)
"country_ss1" UNIQUE, btree (ctr_iso_cd)
This is the execution plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Hash Full Join (cost=482.50..14564568.50 rows=300000000 width=29) (actual time=8.810..305863.949 rows=300015000 loops=1)
Hash Cond: ((mea.ocs_imei)::text = (ctr.ctr_name)::text)
-> Seq Scan on mea_req_54 mea (cost=0.00..10439086.00 rows=300000000 width=19) (actual time=0.005..131927.791 rows=300000000 loops=1)
-> Hash (cost=295.00..295.00 rows=15000 width=13) (actual time=8.784..8.784 rows=15000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 791kB
-> Seq Scan on country ctr (cost=0.00..295.00 rows=15000 width=13) (actual time=0.008..4.138 rows=15000 loops=1)
Planning time: 0.085 ms
Execution time: 355065.791 ms
(8 rows)
Upvotes: 1
Views: 1342
Reputation: 11
Post reducing the parallel_tuple_cost and parallel_setup_cost parameters from the default values, the query ran with parallelism.
But I wanted to know about these parameters what exactly they are ?
Upvotes: 0
Reputation: 246698
The documentation is silent about that, but in backend/optimizer/path/joinpath.c
, function hash_inner_and_outer
, I find the following enlightening comment:
/*
* If the joinrel is parallel-safe, we may be able to consider a
* partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
* because the outer path will be partial, and therefore we won't be
* able to properly guarantee uniqueness. Similarly, we can't handle
* JOIN_FULL and JOIN_RIGHT, because they can produce false null
* extended rows. Also, the resulting path must not be parameterized.
*/
This makes sense – a parallel worker that scans part of mea_req_54
has no way to know if there is a row in country
that does not match any of the rows in mea_req_54
.
Now nested loop joins cannot be used for full outer joins, so all that remains is a parallel merge join.
I can't say if a merge join is an option here, but you may try and create an index on mea_req_54(ocs_imei)
and see if that helps the optimizer choose a parallel plan.
Otherwise, you are probably out of luck.
Upvotes: 1