Ferit
Ferit

Reputation: 648

Simple Regression of Time Series with Apache Maths in Java

I have a question concerning the start of the date unit when doing a simple regression of a time series. Here is my code when starting the date unit the regression at t=0 and t=1.

package main;

import java.util.ArrayList;
import java.util.Arrays;

import org.apache.commons.math3.stat.regression.SimpleRegression;

public class RegressionTest {

    public static void main(String[] args) {

        SimpleRegression simpleRegression = new SimpleRegression();

        ArrayList<Double> timeSeries = new ArrayList<Double>(Arrays.asList(3.0,
                5.0, 1.0, 7.0, 9.0, 2.0, 1.0, 8.0, 11.0));

        for(int i = 0; i < timeSeries.size(); i++) {
            simpleRegression.addData(i, timeSeries.get(i));
        }

        System.out.println("Start date unit at t = 0:");
        System.out.println("Intercept: " + simpleRegression.getIntercept());
        System.out.println("Slope    : " + simpleRegression.getSlope());


        simpleRegression = new SimpleRegression();

        for(int i = 0; i < timeSeries.size(); i++) {
            simpleRegression.addData((i+1), timeSeries.get(i));
        }

        System.out.println("\nStart date unit at t = 1:");
        System.out.println("Intercept: " + simpleRegression.getIntercept());
        System.out.println("Slope    : " + simpleRegression.getSlope());

    }



}

The output I get is:

Start date unit at t = 0:
Intercept: 2.8222222222222224
Slope    : 0.6

Start date unit at t = 1:
Intercept: 2.2222222222222223
Slope    : 0.6

You see, the intercept is different. So my question is: What is the correct start unit when no date is specified for the time series?

Thanks for your answer.

Upvotes: 0

Views: 2032

Answers (2)

Zielu
Zielu

Reputation: 8562

You just moved your line one unit to the right (you just changed x for the first point from 0 to 1) so offcourse you intercept is different and the slope are the same (plot it if you don't see it).

Time series as the name sussgest is series of data for given times, so it must have a time (the x, or first parameter of addData) and the function value for that time (the y, or the second parameter of addData).

You should know what the times are for your data, so if they start at 0, 1, or maybe 1345454. You must provide pair of values (x,y) for the regression.

Upvotes: 1

duffymo
duffymo

Reputation: 308988

I don't think regression is the right thing here. A statistician would say that regression applies when observations are independent. That's not the case with a time series: there's clearly a notion of order in time that breaks the "independent" assumption.

I wonder if a better idea would be a discrete Fourier transform. Examining the frequency content of the signal would be more meaningful.

Upvotes: 0

Related Questions