Reputation: 459
Lets say that for the past few months we have been selling 1000 different products. We log the "performance" of each product (i.e. how much money it generates) every 5 minutes. A day has 288 segments of 5 minutes. Our log looks like this:
prod_1 | 2013-03-28 | 1 | 0
prod_1 | 2013-03-28 | 2 | 9.90
prod_1 | 2013-03-28 | 3 | 19.80
prod_1 | 2013-03-28 | 4 | 19.80
...
prod_1 | 2013-03-28 | 287 | 2326.5
prod_1 | 2013-03-28 | 288 | 2326.5
So, on 28th March we sold 235 units of prod_1
and we can draw the curve of the product's progress throughout the day. Each product/date pair is our unique object, i.e. we do not connect different days of selling the same product. We have the same data for all of the products.
Lets say on 2013-03-29
we add a new product - prod_1001
. The last line in our log for this product reads:
prod_1001 | 2013-03-29 | 153 | 804,6
Question: what machine algorithm should we use to predict the revenue that this specific product will have generated at the end of the day?
prod_1001 | 2013-03-29 | 288 | ???
Upvotes: 0
Views: 1662
Reputation: 947
Without being an expert, my feeling is that this is a time series problem, and as far as I know, Mahout doesn't have anything specific for doing time series (I mention this because you tagged the question as Mahout).
These links from mailing lists should provide some light into the matter: link1, link2. They are from 2011, but I think they information still holds true.
The basic gist, is that Mahout doesn't have it, but you could implement such a thing and contribute to the project or use a better suited statistical software for the task like R (link)
Upvotes: 0
Reputation: 7394
This isn't an algorithm, but I'd make the following suggestions about the kind of model you might use:
So: if you'd see a mean of 4 units sold in timeslices for prod_1001 so far today, your distribution over how many you'll sell in the next time is Poisson(4). If the product sells for £4.99, your expected revenue in the next timeslice is £19.96, you have less than a 5% chance of making more than 8*£4.99 = £39.94 etc etc. If there are 50 timeslices left today, then you expect to make 50*4*£4.99=£998 more today.
You might ask how to incorporate the knowledge gleaned from the other other products: my instinct as to the easiest way to do this is to use them to estimate an Empirical Bayes prior on the Poisson parameter. This means estimating the two parameters of a Gamma distribution on the Poisson rate, and a simple criterion for that would be to maximise the likelihood of the observations on the other 1000 products. Given this prior, you do Bayesian inference on the Poisson distribution for product 1001, which is pretty straightforward as the posterior predictive distribution has closed form.
Upvotes: 2