Reputation: 4323
I'm getting data using a query like this:
SELECT `product_name`, `Number of Cases` FROM `myTable` where `color` = 'red' order by `Number of Cases` DESC";
I get back an array like this if I var_dump
:
array(2) {
[0]=> array(4) { ["product_name"]=> string(5) "23418" [0]=> string(5) "23418" ["Number of Cases"]=> string(3) "434" [1]=> string(3) "434" }
[1]=> array(4) { ["product_name"]=> string(5) "79746" [0]=> string(5) "79746" ["Number of Cases"]=> string(3) "372" [1]=> string(3) "372" }
}
The above is an example, but obviously, I have a lot more rows in the real table.
Here's what I'm looking to do...
Get all the values for "Number of Cases" and break them into quintiles (5 buckets with an equal number of values in each "bucket") or deciles (10 buckets with an equal number of values in each "bucket"). I need to return the highest value (for "number of cases") for each quintile or decile.
I see the flow being like this:
I know the steps, but I can't translate them into working PHP.
Any insights would help.
Thanks.
Upvotes: 3
Views: 1165
Reputation: 1270391
You can add a tile
column into the data in MySQL, by using variables. Here is an example for both quintiles and deciles:
SELECT t.*,
floor((seqnum - 1) * 5 / @rn) as tile_5,
floor((seqnum - 1) * 10 / @rn) as tile_10
FROM (SELECT `product_name`, `Number of Cases`, (@rn := @rn + 1) as seqnum
FROM `myTable` CROSS JOIN
(SELECT @rn := 0) params
WHERE `color` = 'red'
ORDER BY `Number of Cases` DESC
) t
EDIT:
The above should work, but perhaps something funky is happening with @rn
. If so:
SELECT t.*,
floor((seqnum - 1) * 5 / x.cnt) as tile_5,
floor((seqnum - 1) * 10 / x.cnt) as tile_10
FROM (SELECT `product_name`, `Number of Cases`, (@rn := @rn + 1) as seqnum
FROM `myTable` CROSS JOIN
(SELECT @rn := 0) params
WHERE `color` = 'red'
ORDER BY `Number of Cases` DESC
) t CROSS JOIN
(SELECT COUNT(*) as cnt FROM myTable WHERE color = 'red') x;
Upvotes: 1
Reputation: 4323
I ended up with something like this for my query, which is pretty fast even on a decent-sized table:
SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 20/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `20th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 40/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `40th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 60/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `60th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases` SEPARATOR ','),
',', 80/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `80th`
FROM `myTable` where `color`="red"
This example returns 4 numbers at the 20%, 40%, 60%, 80% percentile. The four numbers are the value of the row (number of cases
) for the specific percentile.
Note that you'll likely have to increase your group_concat_max_len
in order for this to work...something like this:
$sessiont = 'SET SESSION group_concat_max_len = 10485760;';
$fixed = $dbh->query($sessiont);
Hat tip to this blog for the solution: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/
Upvotes: 0