Reputation: 2092
Problem is to calculate the median of values that are distributed in a table. I have multiple values in a table column against the same uids, Where I need to find the median of those values for each uid. In order to do that, first I concatenated the values and then trying to find the median among the grouped values against each uid. After group concatenation, I have got the values into a comma separated list.
Now, I need to find the median from the comma separated values given in the following table in column "text_responded_in_hour".
uId |text_responded_in_hour
----|-----------------------
176 |70,660,70
177 |102
194 |102,102
I have tried Google but didn't able to find any related solution.
Upvotes: 0
Views: 768
Reputation: 2092
The following code for calculating medians can work for even and odd number of values. This piece of code served the purpose :)
SELECT
uId,
date,
(SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(responded_text_time_in_hour ORDER BY responded_text_time_in_hour),
',',
((
ROUND(
LENGTH(GROUP_CONCAT(responded_text_time_in_hour)) -
LENGTH(
REPLACE(
GROUP_CONCAT(responded_text_time_in_hour),
',',
''
)
)
) / 2) + 1
)),
',',
-1
) +
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(responded_text_time_in_hour ORDER BY responded_text_time_in_hour), ',', (COUNT(*)/2) )
, ',', -1))/2 as median
FROM outTable
WHERE
(responded_text_time_in_hour>0 AND responded_text_time_in_hour <=3600)
GROUP BY 1,2
Upvotes: 0
Reputation: 2862
You really should use programming language, like python, to do that. You can't do that easily with MySQL, you could do it with postgres or MSSQL or any other dbms for that matter. In MySQL you can hack it if you know how many values there are, by using json functions. But this looks wrong (applies to MySQL 5.7.9+):
insert into yourtable
select uId as id, concat('[', text_responded_in_hour, ']') as jsoncol
from startingtable;
select ID, AVG(val)
(
select id, jsoncol->'$[0]' as val from yourtable
union all select id, jsoncol->'$[1]' from yourtable
union all select id, jsoncol->'$[2]' from yourtable
-- as many times as needed
) as a
group by ID
Upvotes: 0
Reputation: 21533
Doing this isn't really viable. Comma separated lists are generally a very bad idea in a database due to problems like this.
It is potentially possible, if not quick or flexible.
The following does it by generating a list of numbers from 1 to 1000 (or 1 more than the number of commas). It only works if the max number of comma separated values is less than 1000. Can easily be expanded to cope with larger numbers but will become even less efficient as it is expanded.
It then uses that range of numbers to extract a value from the comma separated list.
Once that is done the AVG function can be used.
SELECT uID,
AVG(individual_responded_in_hour)
FROM
(
SELECT uID,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(text_responded_in_hour, ',', (hundreds.aCnt * 100 + tens.aCnt * 10 + units.aCnt + 1)), ',', -1) AS SIGNED) AS individual_responded_in_hour
FROM some_table
CROSS JOIN (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10) units
CROSS JOIN (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10) tens
CROSS JOIN (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10) hundreds
WHERE (hundreds.aCnt * 100 + tens.aCnt * 10 + units.aCnt) <= (LENGTH(text_responded_in_hour) - LENGTH((REPLACE(text_responded_in_hour, ',', '')))
) sub0
GROUP BY uID
It would be possible to do this using a custom written MySQL function which might be more efficient.
But either way I suggest that the complexity of doing this is more of a good reason to not store the values in a comma separated list.
Upvotes: 3