T. Holmström
T. Holmström

Reputation: 153

Polynomial fit doesn't plot high degrees

I'm working now with regression and tried to fit polynomial model to my data with 3 different degrees and it plots only the lowest degrees. I have no idea where I'm going wrong. Here is my code and data points:

# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
import time

def main():
    inputfile = "statistics.txt"
    X= np.loadtxt(inputfile, delimiter=",",dtype=np.str, usecols=[0])
    X= [dt.datetime.strptime(date, '%d.%m.%Y') for date in X]
    X = mdates.date2num(X)
    Y= np.loadtxt(inputfile, delimiter=",", usecols=[1])
    num_training = int(0.9*len(X))
    num_test = len(X) - num_training
    X_train, Y_train = X[:num_training], Y[:num_training]
    X_test, Y_test = X[num_training:], Y[num_training:]
    plt.scatter(X_train, Y_train, color="blue",s=10, marker='o')
    plt.title("Euro Swedish Krona Exchange rate")
    plt.xlabel("Time in months from April to June in 2017")
    plt.ylabel("Exhange rate")
    colors = ['teal', 'yellowgreen', 'gold']
    for count, degree in enumerate([2, 3, 4]):
        coeffs = np.polyfit(X_train, Y_train, degree)
        f = np.poly1d(coeffs)
        x_line = np.linspace(X[0], X[-1], 50)
        x_line_plot = mdates.num2date(x_line)
        y_line = f(x_line)
        plt.plot(x_line_plot, y_line, color=colors[count], linewidth=2, label="degree {}".format(degree))
        print(coeffs)
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b'))
    plt.gca().xaxis.set_major_locator(mdates.MonthLocator())    
    plt.grid()
    plt.legend(loc='upper left')
    plt.show()

if __name__ == '__main__':
    main()

-----statistics.txt-----

11.04.2017,9.6059
12.04.2017,9.5741
13.04.2017,9.5976
14.04.2017,9.5892
17.04.2017,9.5763
18.04.2017,9.6101
19.04.2017,9.6107
20.04.2017,9.6309
21.04.2017,9.6611
24.04.2017,9.6266
25.04.2017,9.5858
26.04.2017,9.5551
27.04.2017,9.6070
28.04.2017,9.6474
01.05.2017,9.6438
02.05.2017,9.6220
03.05.2017,9.6326
04.05.2017,9.7007
05.05.2017,9.6669
08.05.2017,9.6616
09.05.2017,9.6649
10.05.2017,9.6974
11.05.2017,9.6489
12.05.2017,9.6480
15.05.2017,9.6903
16.05.2017,9.7402
17.05.2017,9.7432
18.05.2017,9.7797
19.05.2017,9.7800
22.05.2017,9.7683
23.05.2017,9.7363
24.05.2017,9.7255
25.05.2017,9.7378
26.05.2017,9.7233
29.05.2017,9.7138
30.05.2017,9.7580
31.05.2017,9.7684
01.06.2017,9.7402
02.06.2017,9.7256
05.06.2017,9.7388
06.06.2017,9.7707
07.06.2017,9.7833
08.06.2017,9.7685
09.06.2017,9.7579
12.06.2017,9.7980
13.06.2017,9.7460
14.06.2017,9.7634
15.06.2017,9.7540
16.06.2017,9.7510
19.06.2017,9.7475
20.06.2017,9.7789
21.06.2017,9.7676
22.06.2017,9.7581
23.06.2017,9.7629
26.06.2017,9.7537
27.06.2017,9.7647
28.06.2017,9.7213
29.06.2017,9.6806
30.06.2017,9.6309
03.07.2017,9.6479
04.07.2017,9.6740
05.07.2017,9.6332
06.07.2017,9.6457
07.07.2017,9.6084
10.07.2017,9.6101
11.07.2017,9.6299

I think that its something with dates because I got plot working without dates. There could also be something too much in my code. Its also weird that by changing degree values I sometimes get 3 curves and sometimes only 1.

Upvotes: 1

Views: 868

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114831

As @DavidG pointed out in a comment, the three curves are very close, so they look the same unless you zoom in.

That is just a symptom of the problem. You probably noticed the warnings that were printed when you ran the code. These indicate a numerical problem occurring in polyfit. Your X values are relatively large and very close together. Apparently they are large enough to cause problems in polyfit. One way to avoid this is to subtract the mean from the X values, so they are centered around 0. (You might also consider dividing the shifted data by its standard deviation. This combined shifting and scaling is known as whitening. In this case, simply shifting the data is sufficient.)

Here's modified version of the fitting and plotting code that does this shift of the X values (I tweaked the colors and style a bit, too):

colors = ['teal', 'darkgreen', 'black']
markers = ['-', ':', '--']
alphas = [1, 1, 0.25]
mu = X_train.mean()
for count, degree in enumerate([2, 3, 4]):
    coeffs = np.polyfit(X_train - mu, Y_train, degree)
    f = np.poly1d(coeffs)
    x_line = np.linspace(X[0], X[-1], 50) 
    x_line_plot = mdates.num2date(x_line)
    y_line = f(x_line - mu)
    plt.plot(x_line_plot, y_line, markers[count], color=colors[count],
             linewidth=1+2*(count==2), alpha=alphas[count],
             label="degree {}".format(degree))
    print(coeffs)

It turns out the degree 3 and degree 4 curves are still close, but they are quite different from the degree 2 curve:

plot

Upvotes: 4

Related Questions