Thandi Ngamu
Thandi Ngamu

Reputation: 87

mysql query to find most common and their neighboring numbers

I hope I phrased the question right, I Have been on this for quite sometime now looking for different approaches, I have an app that polls users for prices they think are right for a commodity then from there the app is to come up with an average price based on the most common to near common prices and averaging them issue is coming up with a query that can find those prices, so a sample of my table:

|id |commodityFk|price |dateCreated        |
--------------------------------------------
|1  |1          |1200  |2016-12-24 22:30:30|
|2  |1          |500000|2016-12-24 22:30:30|
|3  |1          |500000|2016-12-24 22:30:30|
|4  |1          |450000|2016-12-24 22:30:30|
|5  |1          |506980|2016-12-24 23:15:12|
|6  |1          |2000  |2016-12-25 23:57:06|

so from this table the most common price is 500000 but we also have prices like 506980 and 450000 which are near the common price so we expect it to average 500000, 500000, 450000 and 506980 personally I am not conversant with MySQL any help solving this will be appreciated.

Upvotes: 0

Views: 43

Answers (2)

TheGentleman
TheGentleman

Reputation: 2362

Are you looking for an entirely MySQL based solution? As mentioned in a comment, you should really define "near" more explicitly. In the below example, I've called any value within 1 Std Deviation of the average a "near" value.

Additionally, what do you do if there is more than 1 most common price? Without knowing the specifics of your requirements I might suggest taking an approach that bypasses that problem entirely by using the average as a starting point rather than the mode. Alternatively, you could use a COALESCE() function to attempt to get a mode and then use the average if that fails.

Here is an example of something that will output a value based on the clustering of values and avoids having to deal with weird mode related edge cases.

SELECT AVG(price) FROM prices
  JOIN (SELECT AVG(price) as rawAverage, STD(price) as deviation FROM prices) stats
  WHERE commodityFk = 1
  AND price BETWEEN 
    (rawAverage - deviation) AND (rawAverage + deviation);

This is obviously just a starting point but it is fairly scalable. You could easily change the expression in the JOIN clause to change how the boundaries of "near" values are defined.

Upvotes: 2

BizzyBob
BizzyBob

Reputation: 14750

You might find it helpful to return the full price list from SQL and create a PHP function that analyzes the list of prices to determine the "prices near the common price".

This would make it easy to tweak the criteria.

Perhaps something like this would be a start for you:

function findPricesNearCommonPrice($data)
{
    $pricesNearCommonPrice = Array();

    // find most common price
    $countOfEachValue = array_count_values($data);
    $mostCommonPrice = array_search(max($countOfEachValue), $countOfEachValue); // doesn't account for items that occur the same number of times, but you could make it do that :-)
    echo "Most Common Price: " . $mostCommonPrice . "<br><br>";

    $tolerance = .15; // 15%

    $minNearPrice = $mostCommonPrice * (1 - $tolerance);
    $maxNearPrice = $mostCommonPrice * (1 + $tolerance);

    foreach ($data as $p) {
        if ($p > $minNearPrice && $p < $maxNearPrice) {
            $pricesNearCommonPrice[] = $p;
        }
    }

    return $pricesNearCommonPrice;
}

Then if you do:

$data = Array(500000, 500000, 450000, 506980, 2000);

$values = findPricesNearCommonPrice($data);
$average = array_sum($values) / count($values);

echo "Prices near the most common price:<br>";
echo implode(", ",$values);

echo "<br><br>";
echo "Average: " . $average;

You get:

Most Common Price: 500000

Prices near the most common price:
500000, 500000, 450000, 506980

Average: 489245

Of course you'll need to modify it to fit your exact needs and format of your data, but hopefully it's a start.

Upvotes: 1

Related Questions