Reputation: 18595
Daniel Collicott:
if I understand you correctly, you want to calculate the optimal selling value of an item. (or are you trying to calculate the real value??)
Sellers are quite naturally gaming (e.g. ebay), trying to maximize their profits.
For this reason, I'd would avoid average/SD approaches: they are too sensitive to outliers created by particular selling tactics.
Game-theory-wise, I think clever sellers would estimate the highest likely selling price (maximal profits) by researching their competitors and their historical sales output: to find the sweet spot.
For this reason I would record a histogram of historical prices over all sellers and look at the distribution of prices, using something approaching the mode to determine the optimal price i.e. the most common sale price. Better still, I would weigh prices by the profit (proportional to historical sales volume) of each individual seller.
I suspect this would be nearer to your optimal market value; if you are looking for the real market value then comment below or contact me at my machine learning firm
A more detailed explanation for the things refered to in @Daniel Collicott's post:
--> optimal selling value
--> real selling value
--> algorithms for both
Upvotes: 9
Views: 2747
Reputation: 5758
If all you want to do is normalise your dataset - i.e. to converge on a set that that reflects the mean then you could use the Kurtosis and Skewness to characterise the structure of your dataset to help identify outliers - (compute the metrics for each point using the rest of the dataset aim to minimise Kurtois and preserve the tendancy of the Skewness - reject extreme values and repeat until excluding a value doesn't significantly change metrics).
But your problem is a bit more interesting:
Let me see if I've got this right: You have an imperfect understanding of the foobar market, but you have access to limited concrete information about it.
You want to use your limited dataset to predict hidden information about the market.
You need the Bayesian Average (see also Bayesian Inference).
Let's assume you have 1000 prices per day;
For each day, compute: mean, mode, median, stdev, kurtosis and skewness - this gives a handle of the shape of the market:
Comparing daily values will enable you to measure the health of the market.
Once you have a few weeks worth of trend data (it gets better over time) you can start testing for true prices.
If the true prices are jumping around then either the sample size is too small or the market isn't functioning properly (i.e. some of the participants are paying above the value, selling below value, supply is restricted, purchase price isn't related to value, etc).
I've had a go modelling used car prices (they're not homogenous) but I did get some reasonable convergence - +/- 10% but that was on a limited dataset. It would also seem to work with house prices, not commodities or football scores.
It's never going to give you a definitive predictive answer, especially not in an auction environment - but it should get you a lot closer to the true price than an arithmetic mean would.
Upvotes: 3
Reputation: 69
If I understand you correctly, you want to calculate the optimal selling value of an item. (or are you trying to calculate the real value??)
Sellers are quite naturally gaming (e.g. ebay), trying to maximize their profits.
For this reason, I'd would avoid average/SD approaches: they are too sensitive to outliers created by particular selling tactics.
Game-theory-wise, I think clever sellers would estimate the highest likely selling price (maximal profits) by researching their competitors and their historical sales output: to find the sweet spot.
For this reason I would record a histogram of historical prices over all sellers and look at the distribution of prices, using something approaching the mode to determine the optimal price i.e. the most common sale price. Better still, I would weigh prices by the profit (proportional to historical sales volume) of each individual seller.
I suspect this would be nearer to your optimal market value; if you are looking for the real market value then comment below or contact me at my machine learning firm
Upvotes: 2
Reputation: 5576
Dan, reading your comments I'm starting to think what you want can be achieved very simply. This is in C# but it is so simple it should be easy to understand:
const double reasonable_price_range = 1.5;
List<double> prices = new List<double> { 50.00, 51.00, 52.00, 100.00, 101.00, 102.00, 150.00, 151.00, 152.00 };
double min = prices.Min();
var reasonable_prices = (from p in prices where p <= min * reasonable_price_range select p).ToList();
Discard all numbers which are larger than the smallest price by a certain percentage (percentage is the best measure here IMO), then return the rest.
This should work for all your examples. The 1.5 constant is arbitrary and should probably be higher (the question is, if we know price X is reasonable, how high can the price go and still be considered reasonable?). However, this relies on there not being even a single low outlier - the lowest price on the list must be a reasonable one.
Of course, min * constant is not necessarily the optimal decision function, but if we can rely on the min never being an outlier, the problem becomes much simpler, as instead of grouping elements we can compare them to the minimum element in some way.
Upvotes: 2
Reputation: 154543
Ok, after a lot of struggling here is a solution that seems to work regardless of how extreme (or not) are max the outliers. Bare in mind that my math knowledge is pretty raw so take this with a grain of salt.
$prices = array
(
'baz' => array(12.34, 15.66),
'bar' => array(12.34, 102.55),
'foo' => array(12.34, 15.66, 102.55, 134.66),
'foobar' => array(12.34, 15.22, 14.18, 20.55, 99.50, 15.88, 16.99, 102.55),
);
foreach ($prices as $item => $bids)
{
$average = average($bids);
$standardDeviation = standardDeviation($bids);
foreach ($bids as $key => $bid)
{
if ($bid > ($average + ($average - $standardDeviation)))
{
unset($bids[$key]);
}
}
$prices[$item] = $bids;
}
print_r($prices);
function average($arguments)
{
if (count($arguments) > 0)
{
return array_sum($arguments) / count($arguments);
}
return 0;
}
function standardDeviation($arguments)
{
if (count($arguments) > 0)
{
$result = Average($arguments);
foreach ($arguments as $key => $value)
{
$arguments[$key] = pow($value - $result, 2);
}
return sqrt(Average($arguments));
}
return 0;
}
Output (demo):
Array
(
[baz] => Array
(
[0] => 12.34
[1] => 15.66
)
[bar] => Array
(
[0] => 12.34
)
[foo] => Array
(
[0] => 12.34
[1] => 15.66
)
[foobar] => Array
(
[0] => 12.34
[1] => 15.22
[2] => 14.18
[3] => 20.55
[5] => 15.88
[6] => 16.99
)
)
Upvotes: 2
Reputation: 154543
Your first problem pretty straightforward using the average and the standard deviation:
$prices = array
(
'bar' => array(12.34, 102.55),
'foo' => array(12.34, 15.66, 102.55, 134.66),
'foobar' => array(12.34, 15.22, 14.18, 20.55, 99.50, 15.88, 16.99, 102.55),
);
foreach ($prices as $item => $bids)
{
$average = call_user_func_array('Average', $bids);
$standardDeviation = call_user_func_array('standardDeviation', $bids);
foreach ($bids as $key => $bid)
{
if (($bid < ($average - $standardDeviation)) || ($bid > ($average + $standardDeviation)))
{
unset($bids[$key]);
}
}
$prices[$item] = $bids;
}
print_r($prices);
Basically you just need to remove bids lower than avg - stDev
or higher than avg + stDev
.
And the actual functions (ported from my framework):
function Average()
{
if (count($arguments = func_get_args()) > 0)
{
return array_sum($arguments) / count($arguments);
}
return 0;
}
function standardDeviation()
{
if (count($arguments = func_get_args()) > 0)
{
$result = call_user_func_array('Average', $arguments);
foreach ($arguments as $key => $value)
{
$arguments[$key] = pow($value - $result, 2);
}
return sqrt(call_user_func_array('Average', $arguments));
}
return 0;
}
Output (demo):
Array
(
[bar] => Array
(
[0] => 12.34
[1] => 102.55
)
[foo] => Array
(
[1] => 15.66
[2] => 102.55
)
[foobar] => Array
(
[0] => 12.34
[1] => 15.22
[2] => 14.18
[3] => 20.55
[5] => 15.88
[6] => 16.99
)
)
Upvotes: 7