Avi
Avi

Reputation: 11

Union All Query takes too long

This question have been asked multiple times I am sure, but every case is different.

I have MySQL setup on a strong computer with 2GB RAM, it does not do too much so the computer is sufficient.

The following query has been built as a view :

create view view_orders as

select distinct 

    tbl_orders_order.order_date AS sort_col,
    tbl_orders_order.order_id AS order_id,
_utf8'website' AS src,tbl_order_users.company AS company,
tbl_order_users.phone AS phone,
tbl_order_users.full_name AS full_name,
time_format(tbl_orders_order.order_date,_utf8'%H:%i') AS c_time,
date_format(tbl_orders_order.order_date,_utf8'%d/%m/%Y') AS c_date,
tbl_orders_order.comments AS comments,
tbl_orders_order.tmp_cname AS tmp_cname,
tbl_orders_order.tmp_pname AS tmp_pname,
count(tbl_order_docfiles.docfile_id) AS number_of_files,
(case tbl_orders_order.status when 1 then _utf8'completed' when 2 then _utf8'hc' when 0 then _utf8'not-completed' when 3 then _utf8'hc-canceled' end) AS status,

tbl_orders_order.employee_name AS employee_name,
tbl_orders_order.status_date AS status_date,
tbl_orders_order.cancel_reason AS cancel_reason

 from 
        tbl_orders_order left join tbl_order_users on tbl_orders_order.user_id = tbl_order_users.user_id 
    left join 
        tbl_order_docfiles on tbl_order_docfiles.order_id = tbl_orders_order.order_id

    group by 
        tbl_orders_order.order_id 



union all

select distinct tbl_h.h_date AS sort_col,
(case tbl_h.sub_oid when 0 then tbl_h.order_number else concat(tbl_h.order_number,_utf8'-',tbl_h.sub_oid) end) AS order_id,
(case tbl_h.type when 1 then _utf8'פקס' when 2 then _utf8'email' end) AS src,_utf8'' AS company,
_utf8'' AS phone,_utf8'' AS full_name,time_format(tbl_h.h_date,_utf8'%H:%i') AS c_time,
date_format(tbl_h.h_date,_utf8'%d/%m/%Y') AS c_date,_utf8'' AS comments,tbl_h.client_name AS tmp_cname,
tbl_h.project_name AS tmp_pname,
tbl_h.quantity AS number_of_files,
_utf8'completed' AS status,
tbl_h.computer_name AS employee_name,
_utf8'' AS status_date,
_utf8'' AS cancel_reason 

from tbl_h;

The query used UNION, than I read an article about UNION ALL and now uses that.

Query alone takes about 3 seconds to execute (UNION took 4.5-5.5 seconds) Each part in seperate runs in seconds.

The application does sorting and select on this view, which makes it processing time even larger - about 6 seconds when query is cached, about 12 seconds or more if data has changed.

I see no other way to combine these two results, as both sorted needs to display to the user and I guess something I am doing is wrong.

Of course both tables uses primary keys.

UPDATE!!!!

It didn't help, I got the utf8/case/date_format out of the union query, and removed distincts, now query takes 4 seconds (even longer). query without case/date/utf8 (only union) was shortened to 2.3 seconds (0.3 seconds improvement).

create view view_orders as

select *,
    (CASE src
        WHEN 1 THEN
             _utf8'fax' 
        WHEN 2 THEN 
            _utf8'mail' 
        WHEN 3 THEN
            _utf8'website'
    END) AS src,

    time_format(order_date,'%H:%i') AS c_time,
    date_format(order_date,'%d/%m/%Y') AS c_date,

    (CASE status 
        WHEN 1 THEN 
            _utf8'completed' 
        WHEN 2 THEN
             _utf8'hc handling' 
        WHEN 0 THEN
             _utf8'not completed' 
        WHEN 3 THEN
             _utf8'canceled'
    END) AS status

FROM
(
select 

    o.order_date AS sort_col,
    o.order_id,
    3 AS src,

    u.company,
    u.phone,
    u.full_name,

    o.order_date,

    o.comments,
    o.tmp_cname,
    o.tmp_pname,
    count(doc.docfile_id) AS number_of_files,

    o.status,

    o.employee_name,
    o.status_date,
    o.cancel_reason

 from 
        tbl_orders_order o
    LEFT JOIN
        tbl_order_users u ON u.user_id = o.user_id
    LEFT JOIN
        tbl_order_docfiles doc ON doc.order_id = o.order_id

    GROUP BY
        o.order_id 

union all

select 

    h.h_date AS sort_col,
    (case h.sub_oid when 0 then h.order_number else concat(h.order_number,'-',h.sub_oid) end) AS order_id,
    h.type as src,

    '' AS company,
    '' AS phone,
    '' AS full_name,

    h.h_date,

    '' AS comments,
    h.client_name AS tmp_cname,
    h.project_name AS tmp_pname,
    h.quantity AS number_of_files,
    1 AS status,

    h.computer_name AS employee_name,
    '' AS status_date,
    '' AS cancel_reason 

from tbl_h h

)

Upvotes: 1

Views: 6056

Answers (3)

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115520

Can you try this one:

SELECT 
    o.order_date AS sort_col,
    o.order_id AS order_id,
    _utf8'website' AS src,
    u.company AS company,
    u.phone AS phone,
    u.full_name AS full_name,
    time_format(o.order_date,_utf8'%H:%i') AS c_time,
    date_format(o.order_date,_utf8'%d/%m/%Y') AS c_date,
    o.comments AS comments,
    o.tmp_cname AS tmp_cname,
    o.tmp_pname AS tmp_pname,
    COALESCE(d.number_of_files, 0) AS number_of_files,
    ( CASE o.status WHEN 1 THEN _utf8'completed'
                    WHEN 2 THEN _utf8'hc' 
                    WHEN 0 THEN _utf8'not-completed' 
                    WHEN 3 THEN _utf8'hc-canceled' 
      END ) AS status, 
    o.employee_name AS employee_name,
    o.status_date AS status_date,
    o.cancel_reason AS cancel_reason    
 FROM 
       tbl_orders_order AS o
   LEFT JOIN 
       tbl_order_users AS u
     ON o.user_id = u.user_id 
   LEFT JOIN
       ( SELECT order_id
              , COUNT(*) AS number_of_files
         FROM tbl_order_docfiles
         GROUP BY order_id 
       ) AS d
     ON d.order_id = o.order_id

UNION ALL

SELECT 
    tbl_h.h_date AS sort_col,
  ...

FROM tbl_h

Upvotes: 0

Lukas Eder
Lukas Eder

Reputation: 220762

Think about your using UNION and DISTINCT keywords. Can your query really result in duplicate rows? If yes, the optimal query for removing duplicates would probably be of this form:

SELECT ... -- No "DISTINCT" here
UNION
SELECT ... -- No "DISTINCT" here

There is probably no need for DISTINCT in the two subqueries. If duplicates are impossible anyway, try using this form instead. This will be the fastest execution of your query (without further optimising the subqueries):

SELECT ... -- No "DISTINCT" here
UNION ALL
SELECT ... -- No "DISTINCT" here

Rationale: Both UNION and DISTINCT apply a "UNIQUE SORT" operation on your intermediate result sets. Depending on how much data your subqueries return, this can be very expensive. That's one reason why omitting DISTINCT and replacing UNION by UNION ALL is much faster.

UPDATE Another idea, if you do have to remove duplicates: Remove duplicates first in an inner query, and format dates and codes only afterwards in an outer query. That will accelerate the "UNIQUE SORT" operation because comparing 32/64-bit integers is less expensive than comparing varchars:

SELECT a, b, date_format(c), case d when 1 then 'completed' else '...' end
FROM (
  SELECT a, b, c, d ... -- No date format here
  UNION
  SELECT a, b, c, d ... -- No date format here
)

Upvotes: 4

Shiraz Bhaiji
Shiraz Bhaiji

Reputation: 65381

It may be related to the UNION triggering a character set conversion. For example cancel_reason in the one query is defined as utf8, but in the other it is not specified.

Check if there is a very high cpu spike when you run this query, this would indicate conversion.

Personally I would have done a union of the raw data first, and then applied the case and conversion statements. But I am not sure that that would make a difference in the performance.

Upvotes: 0

Related Questions