TaylorSmolik
TaylorSmolik

Reputation: 126

Optimizing query for calculating Median using GROUP BY & ORDER BY

Please see below for a sample portion of my MySQL table:

Table name: eb_tickets

+-------------------+----------------------+
|  ticket_type      |  time_first_response |
|  Standard Traffic |  0:18:14             |
|  Standard Traffic |  0:48:06             |
|  Miscellaneous    |  44:12:23            |
|  Feed             |  4:48:22             | 
|  Miscellaneous    |  15:33:20            |
|  Banners          |  21:00:02            |
|  Integration      |  36:00:02            |
+-------------------+----------------------+

I want to output the response like so, calculating the median value with distinct values ASC:

+-------------------+----------------------+
|  median_group     |  median              | 
|  Banners          |  21:00:02            |
|  Feed             |  4:48:22             | 
|  Integration      |  36:00:02            |
|  Miscellaneous    |  32:36:13            |
|  Standard Traffic |  0:33:10             |
+-------------------+----------------------+

Currently, I am achieving this with the following query:

SET @row_number:=0; 
SET @median_group:='';

SELECT 
    median_group, AVG(time_first_response) AS median
FROM
(SELECT 
    @row_number:=CASE
        WHEN @median_group = ticket_type THEN @row_number + 1
        ELSE 1
    END AS count_of_group,
    @median_group:=ticket_type AS median_group,
    ticket_type,
    time_first_response,
    (SELECT 
            COUNT(*)
        FROM
            eb_tickets
        WHERE
            a.ticket_type = ticket_type) AS total_of_group 
FROM
    (SELECT 
        ticket_type, time_first_response
    FROM
        eb_tickets           
    ORDER BY ticket_type, time_first_response) AS a) AS b
    WHERE
    count_of_group BETWEEN total_of_group / 2.0 AND total_of_group / 2.0 +1
    GROUP BY median_group

Unfortunately, this query is taking around 1 second per record, and I am querying thousands of records with the possibility of 20 different values for the ticket_type column.

Is it possible to optimize my current query in an attempt to make the query perform faster?

Upvotes: 0

Views: 85

Answers (1)

IVO GELOV
IVO GELOV

Reputation: 14259

Could you try like this?

SELECT counter.ticket_type,AVG(time_first_response) AS median FROM
  (
    SELECT 
      IF(@type = type, @ctr := @ctr + 1, @ctr := 1) AS rownum, 
      @type := ticket_type AS ticket_type,
      time_first_response
    FROM eb_tickets
    ORDER BY ticket_type,time_first_response
  ) AS counter,
  (
    SELECT ticket_type, COUNT(*) AS rows
    FROM eb_tickets
    GROUP BY ticket_type
  ) AS types
  WHERE types.ticket_type = counter.ticket_type AND
    CASE rows % 2 
      WHEN 1 THEN rownum IN (ROUND(rows / 2),ROUND(rows / 2))
      ELSE rownum IN (ROUND(rows / 2),ROUND(rows / 2) + 1)
    END
  GROUP BY counter.ticket_type

Initially, while I was experimenting on the SQL I used shorter names for the columns - type instead of ticket_type and the subquery counter was left with the wrong column name type instead of ticket_type

Upvotes: 1

Related Questions