Reputation: 402
I need to predict any given customer's next order quantity.
I have data in below explained schema. This data is basically orders of customers since mid-year 2018. There are over 2000 rows (not so much data, but it's what I have got)
Schema & Explanation of fields:
CustomerId
: Id of customer from DB
ProductId
: Id of Product from DB
ProductTypeId
: Id of Sub type of product. e.g., If Water is product, sub-type can be Sparkling, Mineral etc.
Quantity
: The ordered quantity. This needs to be predicted
CDate
: This is the date on which the order was generated.
What I need is, I should be able to supply ProductId
, ProductTypeId
, ClientId
and the CDate
(this will be a future date) and I should get back what Quantity
the given client could order.
So far, I've tried to do this using given Regression samples from ML.NET website. They don't work since Quantity is always predicted to be zero.
On researching further I found that it's because of the CDate field.
So after transforming this categorical field to number using OneHotEncoding
, the prediction was no more Zero but it was not accurate too. Test data and predicted values were way off.
Turns out, this is not correct method to handle dates.
I tried to find resources where prediction is based on Date and other features, but could not find them. The taxi-fare-prediction does not have date. Other samples are not related to what I need.
Which solution can I use? Time series? How do I train if I want to predict purchase per customer / per product / per product-type and by date?
I am new to machine learning. Any pointers will help. Hate to ask, but a working solution in ML.NET would help me long way.
If it's not possible in ML.NET, then I'm open to use Python (new to this too!) and I am willing to learn.
Thank you.
Data file can be downloaded from here.
Upvotes: 2
Views: 1575
Reputation: 14389
To make a machine learn prediction, you need to understand the causality of the result yourself. Meaning:
*You can program a model only after you have a mental model yourself.
There are two meaningful contributions that I can make:
Feature Engineering:
You are using CustomerId
, ProductId
, ProductTypeId
, CDate
to predict the Quantity
of product. Nobody stops you from creating a model that takes this set of inputs to generate the output, but do these inputs have a correlation with the output?
Doesn't seem that way to me. I think to make a sensible model, you will need better input variables. Some of them could be the size of previous order, the turn over of the potential buyer etc. Those factors are likely to give a better output.
So, consider improving the input features.
Model Selection:
In this case, it seems like an ensemble would be better than using a single model. Particularly, Linear Regression and Decision Trees seem relevant.
There is no shortcut that I can hand you over. To understand and get intuition about which models to use and when to use it, you will have to try your hand on them multiple times.
Finally, to train the models, there is a standard approach. You divide the input data into 5 parts (i.e. 20% each). Then you tune the model with four parts and test the tuning on the fifth part. Next you pick another set of four and so on.
*Not true for Neural Networks. The hidden layers take away the ability to truly understand the predictions.
Upvotes: 1