evolved
evolved

Reputation: 2210

Scaling for linear regression and classification using matlab

I am doing three things:

I am confused by the scaling of the data. My problem is that I am not really sure if I scale my data correctly. Furthermore I would like to change the look of my surface plots in matlab.

The data is two dimensional (x and y) with x ranging from 15000 to 80000 and y from 1000 to 5500.

seprable because not scaled

The two regression lines for class 1 and class 0 were found using linear regression. For this I scaled the data, calculated the weights and used the weights for the scaled data also on the none scaled data seen in the picture. I guess this is correct because the weights just define the slope. However, the data wouldn't be separable as long as it is scaled from 0 to 1 seen in the following image.

enter image description here

Now I am confused when I should do scale my data. Because for the linear classification I couldn't scale the data because it wouldn't be seprable obviously. Without scaling I found the following separating plane using gradient descent algorithm:

enter image description here

The first quesiton concerning matlab surf plot: How to get a solid looking separating plane?

For logistic regression it was necessary to scale again. I guess because of the range of the regression function 1/(1 + exp(-w*x)).

I scaled it using this

data = (values - repmat(min(values,[],1),size(values,1),1))*spdiags(1./(max(values,[],1)-min(values,[],1))',0,size(values,2),size(values,2))

which is subtracting the minimum and dividing by the range from the original values. After scaling the data ranges from 0 to 1 for x and y. The weights were calculated using gradient ascent algorithm and found to be

w = 0.2493   33.7885  -36.0428

for the scaled data set and

w = 0.7610  269.3073 -102.6686

for the unscaled data.

The following image is with scaled data:

enter image description here

The basic question is when should I scale my data? And when to use the scaled or unscaled data set?

scale data -> calc weights using scaled data -> plot using scaled or unscaled data?

or

calc weights using unscaled data -> plot using unscaled data? 

I would like the logistic function plot also range from 15000 to 80000 (x) and from 1000 to 5500 (y). When I plot the unscaled version it looks like this, because the logistic function ranges from 0 to 1:

unscaled logistic regression

Is there a better command to plot the surface? mesh, trisurf?

Upvotes: 0

Views: 595

Answers (1)

RPM
RPM

Reputation: 1769

It is not necessary to normalise your data before performing a linear regression, linear classification or logistic regression - although it won't do any harm, the final results should be unchanged by linear transformations.

I don't think you need to plot a surface at all. You have two-dimensional data f(x,y), so to separate the two you would need a line.

Upvotes: 1

Related Questions