usert4jju7
usert4jju7

Reputation: 1813

MySQL - JOIN condition & performance issue

In this fiddle, I've added data filter conditions on employee_name, task_date and region in the JOIN clause. It seems to me I'm doing something wrong here as, in the output, I can't see the data filtered.

I expect to see data only between the dates '2015-01-03' & '2015-01-06' whereas the current query is returning data equivalent to the date filter not being present.

Also, is there a better way to write this query as the fiddle only designed to get the correct SQL whereas the data in production environment is worth several years & the query never seems to complete (It could run even more than 30 mins & keep running happily until killed). The database is quite optimized for performance in case someone is wondering if tuning the DB may be an option.

Any pointers would be really helpful here please.

INPUT/OUTPUT

enter image description here

CODE

CREATE TABLE ForgeRock
    (`task_date` date, `employee_name` varchar(7), `task_name` varchar(55), `region` varchar(100));
INSERT INTO ForgeRock
    (`task_date`, `employee_name`, `task_name`, `region`)
VALUES
    ('2015-01-01', 'A', 'task A','USA'),
    ('2015-01-01', 'B', 'task B','Russia'),
    ('2015-01-01', 'C', 'task C','USA'),
    ('2015-01-01', 'D', 'task D','USA'),
    ('2015-01-02', 'A', 'task A','Russia'),
    ('2015-01-02', 'B', 'task B','Singapore'),
    ('2015-01-02', 'C', 'task C','USA'),
    ('2015-01-02', 'D', 'task D','USA'),
    ('2015-01-03', 'A', 'task C','Australia'),
    ('2015-01-03', 'B', 'task B','London'),
    ('2015-01-03', 'C', 'task D','USA'),
    ('2015-01-03', 'D', 'task A','USA'),
    ('2015-01-03', 'C', 'task C','London'),
    ('2015-01-04', 'A', 'task B','USA'),
    ('2015-01-04', 'B', 'task A','Singapore'),
    ('2015-01-04', 'C', 'task C','USA'),
    ('2015-01-04', 'D', 'task D','India'),
    ('2015-01-05', 'A', 'task F','USA'),
    ('2015-01-05', 'B', 'task F','USA'),
    ('2015-01-05', 'C', 'task G','China'),
    ('2015-01-05', 'D', 'task B','USA'),
    ('2015-01-06', 'A', 'task Y','USA'),
    ('2015-01-06', 'B', 'task X','USA'),
    ('2015-01-06', 'C', 'task E','USA'),
    ('2015-01-06', 'D', 'task R','USA'),
    ('2015-01-07', 'A', 'task W','China'),
    ('2015-01-07', 'B', 'task O','Russia'),
    ('2015-01-07', 'C', 'task P','USA'),
    ('2015-01-07', 'D', 'task S','London'),
    ('2015-01-07', 'C', 'task E','USA'),
    ('2015-01-08', 'A', 'task E','USA'),
    ('2015-01-08', 'B', 'task W','USA'),
    ('2015-01-08', 'C', 'task C','USA'),
    ('2015-01-08', 'D', 'task B','London');

SQL QUERY

SELECT   task_date, 
         employee_name, 
         Group_concat(task_name) 
FROM     ( 
             SELECT DISTINCT a.task_date, 
                             a.employee_name, 
                             CASE 
                                 WHEN b.employee_name IS NOT NULL
                                     AND c.employee_name IS NULL THEN NULL
                                 ELSE a.task_name
                             END AS task_name 
             FROM forgerock AS a 
             LEFT OUTER JOIN forgerock AS b 
                 ON  a.employee_name = b.employee_name = 'A'
                 AND a.task_date >= '2015-01-03' 
                 AND a.task_date <= '2015-01-06' 
                 AND b.task_date >= '2015-01-03' 
                 AND b.task_date <= '2015-01-06' 
                 AND a.task_date - 1 = b.task_date
                 AND a.region = b.region = 'USA' 
             LEFT OUTER JOIN forgerock AS c 
                 ON  a.employee_name = c.employee_name = 'A'
                 AND a.task_date >= '2015-01-03' 
                 AND a.task_date <= '2015-01-06' 
                 AND c.task_date >= '2015-01-03' 
                 AND c.task_date <= '2015-01-06' 
                 AND a.task_date - 1 = c.task_date
                 AND a.task_name <> c.task_name 
                 AND a.region = c.region = 'USA' 
             ORDER BY a.task_date, 
                      a.employee_name, 
                      a.task_name) AS temp 
GROUP BY task_date, 
         employee_name

Upvotes: 0

Views: 63

Answers (2)

Sebastian
Sebastian

Reputation: 3864

Not sure what you are trying to achieve with this query, but all your joins are left outer joins so the a table (the one from the FROM) will never be filtered. You should always check the query plan which for your query simply states no where condition on the a table:

enter image description here

Simple fix would be to specify a WHERE condition on the a table or switch to inner joins.

As performance is concerned you should again have a look at the query plan. If you are filtering a large table with no index (you see only using where in the query plan, example:

enter image description here

) that usually is a sign to perform a deeper investigation.

Upvotes: 1

Daniel E.
Daniel E.

Reputation: 2480

The join is on the full table "a" no matter the date so you need to add the where (see the **) and no need to put it in the join. I'm not sure but the CASE with IS NULL doesn't work everytime, i prefer to use a coalesce :

    SELECT   task_date, 
         employee_name, 
         Group_concat(task_name) 
FROM     ( 
                         SELECT DISTINCT a.task_date, 
                                         a.employee_name, 
                                         CASE 
                                                         WHEN b.employee_name IS NOT NULL
                                                                 AND  COALESCE(c.employee_name, '00') THEN '00'
                                                         ELSE a.task_name
                                         END       AS task_name 
                         FROM            forgerock AS a 
                         LEFT OUTER JOIN forgerock AS b 
                         ON              a.employee_name = b.employee_name = 'A'
                         AND             b.task_date >= '2015-01-03' 
                         AND             b.task_date <= '2015-01-06' 
                         AND             a.task_date - 1 = b.task_date
                         AND             a.region = b.region = 'USA' 
                         LEFT OUTER JOIN forgerock AS c 
                         ON              a.employee_name = c.employee_name = 'A'
                         AND             c.task_date >= '2015-01-03' 
                         AND             c.task_date <= '2015-01-06' 
                         AND             a.task_date - 1 = c.task_date
                         AND             a.task_name <> c.task_name 
                         AND             a.region = b.region = 'USA' 
                         **WHERE  a.task_date >= '2015-01-03' AND   a.task_date <= '2015-01-06'**
                         ORDER BY        a.task_date, 
                                         a.employee_name, 
                                         a.task_name) AS temp 
GROUP BY task_date, 
         employee_name

Upvotes: 2

Related Questions