Reputation: 985
Currently I am working on my project at school and I have a bit extraordinary task. My job is to scrape the data from a certain page on the facebook put that into learning model, where it should have 1 input as List and output as Int32.
Firstly, let me briefly explain algorithms I already designed:
Dictionary<String,List<double[],int>>
, which representspostId:[wordWeights],amountOfLikes
as
23425234_35242352:[0.027,0.031,0.009,0.01233],89
I have to train my model with different posts and their likes. For this purpose, have chosen to use Accord.NET library on C# and so far analyzed their Simple Linear Regression Class.
Firstly, I saw that I can use OrdinaryLeastSqure and feed it with possible inputs and ouputs as
double[] input = {0.123,0.23,0.09}
double[] output = {98,0,0}
OrdinaryLeastSquares ols = new OrdinaryLeastSquares();
regression = ols.Learn(inputs, output);
As you can see number of inputs in array should match number of outputs, therefore, I fulfilled it with zeros. As a result, I got obvious wrong output. I cannot come up with a proper way of feeding my data to Linear Regression Class
. I know that approach with fulfilling the array with zero's is wrong, but it is so far the only solution I came up with. I would appreciate if anyone tells me the way I should use regression in this case and helps in choosing a proper algorithm. Cheers!
Upvotes: 3
Views: 895
Reputation: 985
After browsing different regression algorithms in Accord.NET, I came up with FanChenLinSupportVectorRegression
, which was a part of the Accord.NET Machine Learning
library. I believe, Fan Chen Lin was one of the major contributors of this algorithm, since it was called after his name.
Algorithm uses a concept of support vector regression (SVM).
FanChenLinSupportVectorRegression<TKernel>
, where Kernel
gets or sets the kernel function use to create a kernel Support Vector Machine. If this property is set, UseKernelEstimation will be set to false.
Regression function takes first input as an array, consisting of arrays of doubles (in our case weights of words in a certain post) and second an array of doubles, which consists of amount of likes.
IMPORTANT: sub-array of weights MUST correspond to the amount of likes in a second input in such a way that first sub-array
has its like amount under [0] index in the likes
array, second sub-array
should have its like amount under [1] index in the likes
array etc.
Example:
//Suppose those are posts with tf-idf weights
double[][] inputs =
{
new[] { 3.0, 1.0 },
new[] { 7.0, 1.0 },
new[] { 3.0, 1.0 },
new[] { 3.0, 2.0 },
new[] { 6.0, 1.0 },
};
//amount of likes each corresponding post scored
double[] outputs = {2.0, 3.0, 4.0, 11.0, 6.0};
//Using FanChenLinSupportVectorRegression<Kernel>
var model = new FanChenLinSupportVectorRegression<Gaussian>();
//Train model and feed it with tf-idf of each post and corresponding like amount
var svm = model.Learn(inputs, outputs);
//Run a sample tf-idf input to get a prediction
double result = svm.Score(new double[] { 2.0,6.0});
I have tested this model with swapped inputs of the same value and results were pretty nice and accurate. Model works nice on big inputs as well, however requires more training. Hope this helps anybody in the future.
Upvotes: 1