D Durham
D Durham

Reputation: 1341

How can you reference a column based on the average on another column in MySQL?

We have a scenario where users answer some questions related to a parent entity that we'll call a widget. Each question has both a numeric and word answer. Multiple users answer each question for a given widget.

We then display a row for each widget with the average numeric answer for each question. We do that using a MySQL pseudo-pivot with dynamic columns as detailed here So we end up with something like:

SELECT widget_id, ...
ROUND(IFNULL(AVG(CASE 
      WHEN LOWER(REPLACE(RQ.question, ' ', '_')) = 'overall_size' THEN 
      if(RA.num = '', 0, RA.num) END),0) + .0001, 2) AS `raw_avg_overall_size`,
... 

... where overall_size would be one of the question types related to the widget and might have "answers" from 5 users like 1,2,2,3,1 to that question for a given widget_id based on the answer options below:

Answers

answer_id answer_type num word
111 overall_size 1 x-large
112 overall_size 2 large
113 overall_size 3 medium
114 overall_size 4 small
115 overall_size 5 x-small

So we would end up with a row that had something like this:

widget_id average_overall_size
115 1.80

What we can't figure out is then given if we round 1.80 to zero precision we get 2 in this example which is the word value 'large' from our data above. We like to include that in the query output too so that end up with:

widget_id raw_average_overall_size average_overall_size
115 1.80 large

The issue is that we do not know the average for the row until the query runs. So how can we then reference the word value for that average answer in the same row when executing the query?

As mentioned we are pivoting into a variable and then run another query for the full execution. So if we join in the pivot section, that subquery looks something like this:

SET @phase_id = 1;
SET SESSION group_concat_max_len = 100000;
SET @SQL = NULL;
SET @NSQL = NULL;

SELECT GROUP_CONCAT(DISTINCT
  CONCAT(
   'ROUND(IFNULL(AVG(CASE 
      WHEN LOWER(REPLACE(RQ.short_question, '' '', ''_'')) = ''',
      nsq,
      ''' THEN 
      if(RA.answer = '''', 0, RA.answer) END),0) + .0001, 2) AS `',
      CONCAT('avg_raw_',nsq), '`,
      REF.value, -- <- ******* THIS FAILS **** --
      ROUND(IFNULL(STDDEV(CASE 
      WHEN LOWER(REPLACE(RQ.short_question, '' '', ''_'')) = ''',
      nsq,
      ''' THEN RA.answer  END), 0) + .0001, 3) AS `',
      CONCAT('std_dev_', nsq), '`
    '
  )
  ORDER BY display_order
) INTO @NSQL
FROM (
  SELECT FD.ref_value, FD.element_name, RQ.display_order, LOWER(REPLACE(RQ.short_question, ' ', '_')) as nsq
  FROM review_questions RQ
  LEFT JOIN form_data FD ON FD.id = RQ.form_data_id
  LEFT JOIN ref_values RV on FD.ref_value = RV.type 
  WHERE RQ.phase_id = @phase_id 
    AND FD.element_type = 'select' 
    AND RQ.is_active > 0
    GROUP BY FD.element_name
    HAVING MAX(RV.key_name) REGEXP '^[0-9]+$'

) nq
/****** suggested in 1st answer ******/
LEFT JOIN ref_values REF ON REF.`type` = nq.ref_value
AND REF.key_name = ROUND(CONCAT('avg_raw_',nsq), 0);

So we need the word answer (from the REF join's REF.value field in the above code) in the pivot output, but it fails with 'Unknown column REF.value. If we put REF.value in it's parent query field list, that also fails with the same error.

Upvotes: 2

Views: 137

Answers (3)

SargeATM
SargeATM

Reputation: 2841

Query into temp table, then join

First query should produce table as follows:

CREATE temp table, temp_average_size

widget_id average_overall_size rounded_average_size
115 1.80 2

LEFT JOIN

select s.*, a.word
from temp_average_size s LEFT JOIN answers a 
    ON (s.rounded_average_size = a.num AND a.answer_type = 'overall_size)

Upvotes: 0

Rick James
Rick James

Reputation: 142218

This fails, leading to the default of "2":

LOWER(REPLACE(RQ.question, ' ', '_')) = 'overall_size'

That is because the question seems to be "average_overall_size", not "overall_size".

String parsing and manipulation is the pits in SQL; suggest using the application to handle such.

Also, be aware that you may need a separate subquery to compute aggregate (eg AVG()), else it might not be computed over the set of values you think.

Upvotes: 0

The Impaler
The Impaler

Reputation: 48770

You'll need to join the table/view/query again to get the 'large' value.

For example:

select a.*, b.word
from (
  -- your query here
) a
join my_table b on b.answer_id = a.answer_id
               and b.num = round(a.num);
 

An index on my_table (answer_id, num) will speed up the extra search.

Upvotes: 1

Related Questions