Chad Johnson
Chad Johnson

Reputation: 21895

How can I use this Node.js library to calculate a polynomial regression?

I need an algorithm to calculate a polynomial regression given an input vector. I found this Node.js library which seems to provide what I need.

Looking at the documentation, I see I need to pass a two-dimensional data array:

var data = [[0,1],[32, 67] .... [12, 79]];
var result = regression('polynomial', data, 4);

But I'm unclear as to

  1. Why the input data is two-dimensional?
  2. What the values in each array are meant to be? In the example, what does [0,1] represent (what variable is 0 and what variable is 1?).

Basically this algorithm is intended to calculate data for an indicator used in stock market analysis. So my input is an array of prices: [14.26, 14.27, 14.27, 14.28, 14.29, 14.27, 14.27, 14.28. ...].

Upvotes: 0

Views: 1800

Answers (1)

Jefferey Cave
Jefferey Cave

Reputation: 2889

A little late to the game, but this may be useful to others.

I would suggest doing some reading on what regressions are and how they are calculated. Do some by hand ...

https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data

The short answer is to create an array with both x and y values:

for(var i=data.length-1; i>=0 ; i--){
    data[i] = [i,data[i]];
}

The big issue I see with the original data is that it ignores "x" all together. "x" is likely date, in this case, but by storing the data in an array, we are assuming continuous dates. That seems unlikely given the data you are working with: I would expect gaps in the data during weekends and holidays.

Based on that assumption, here is an example of how to achieve what you are talking about. This would be useful in a lot of types of data:

debugger;
// get some stock data: daily close prices
var data = [{Date:"2016-09-23",Close:85.81},{Date:"2016-09-22",Close:86.35},...];

// Before we can use this data, we need to format 
// it for display and processing. Basically, just 
// make sure that everything is in the right order 
// and such...
var metrics = {
    mindate : moment(data[0].Date),
    maxdate : moment(data[0].Date),
};
data.forEach(function(rec){
    // cast the date to something more friendly
    rec.Date = moment(rec.Date);
    // get the min and max
    if(metrics.mindate.isAfter (rec.Date)){ metrics.mindate = moment(rec.Date); }
    if(metrics.maxdate.isBefore(rec.Date)){ metrics.maxdate = rec.Date; }
});
metrics.days = metrics.maxdate.diff(metrics.mindate,'days');

// One thing you will notice about stock data
// is that the markets are not open on holidays
// this means that we do not have data about 
// what the price was during those days, also
// the distance between our datapoints Friday and 
// Monday is not 1 day. Some work is necessary to 
// make it clear that there is a gap
data.forEach(function(rec){
    rec.day = rec.Date.diff(metrics.mindate,'days');
});

// At this point the data is probably ready
// for the regression library, we just need
// to format it correclty
var d = [];
data.forEach(function(rec){
    var x = rec.day;
    var y = rec.Close;
    d.push([x,y]);
});
var result = regression('polynomial', d, 4);


// Now that the regressino has been calculated, we
// can make use of it. First let's determine what
// data was missing from our original dataset
// (basically teh weekends and holidays)
for(var i=data.length-2; i>=0;i--){
    for(var day=data[i+1].day-1;day>data[i].day;day--){
        data.push({
            day:day,
            Date:moment(metrics.mindate).add(day,'days'),
            Close:null,
            Est:null
        });
    }
}
// While we are at, let's project a couple days into 
// the future
var lastday = metrics.days;
for(var day=metrics.days+30; day>=metrics.days; day--){
        data.push({
            day:day,
            Date:moment(metrics.mindate).add(day,'days'),
            Close:null,
            Est:null
        });
}

// for convenience sake, copy it back into our 
// original dataset
data.forEach(function(rec){
    rec.Est = 0;
    for(var i=result.equation.length-1; i>=0; i--){
        rec.Est += result.equation[i] * Math.pow(rec.day,i);
    }
});

// better sort it at this point
data.sort(function(a,b){
    if(a.day < b.day) return -1;
    if(a.day > b.day) return 1;
    return 0;
});
// Now that processing is complete, we can use
// the data in some manner that is meaningful
// in this case, I display the data with the
// gaps filled in, as well as a projection
data.forEach(function(rec){
    $('table#prices').append(
            '<tr><td>{{day}}</td><td>{{Date}}</td><td>{{Close}}</td><td>{{Est}}</td></tr>'
.replace(/{{day}}/g,rec.day)                .replace(/{{Date}}/g,rec.Date.format('YYYY-MM-DD'))
                .replace(/{{Close}}/g,rec.Close || '')
                .replace(/{{Est}}/g,rec.Est || '')
        );
});
$('pre#est').html(
    metrics.mindate.format('YYYY-MM-DD') 
    + ' -> ' + metrics.maxdate.format('YYYY-MM-DD') 
    + ' (' + metrics.days + ' days)'
    + '\n' + result.string
);

That's a lot to handle, so here's a fiddle:

https://jsfiddle.net/v95evuv8/6/

It depends on regression.js and moment.js (for date manipulation)

The comments contain questions about the utility of doint this. I do something similar in CouchDB when aggregating sets of results (reduce to a regression parameter and monitor for regressions that deviate from expectation), and also often display a regression line in charts on web based reports.

Upvotes: 1

Related Questions