Reputation: 2210
I am doing three things:
I am confused by the scaling of the data. My problem is that I am not really sure if I scale my data correctly. Furthermore I would like to change the look of my surface plots in matlab.
The data is two dimensional (x and y) with x ranging from 15000 to 80000 and y from 1000 to 5500.
The two regression lines for class 1 and class 0 were found using linear regression. For this I scaled the data, calculated the weights and used the weights for the scaled data also on the none scaled data seen in the picture. I guess this is correct because the weights just define the slope. However, the data wouldn't be separable as long as it is scaled from 0 to 1 seen in the following image.
Now I am confused when I should do scale my data. Because for the linear classification I couldn't scale the data because it wouldn't be seprable obviously. Without scaling I found the following separating plane using gradient descent algorithm:
The first quesiton concerning matlab surf plot: How to get a solid looking separating plane?
For logistic regression it was necessary to scale again. I guess because of the range of the regression function 1/(1 + exp(-w*x)).
I scaled it using this
data = (values - repmat(min(values,[],1),size(values,1),1))*spdiags(1./(max(values,[],1)-min(values,[],1))',0,size(values,2),size(values,2))
which is subtracting the minimum and dividing by the range from the original values. After scaling the data ranges from 0 to 1 for x and y. The weights were calculated using gradient ascent algorithm and found to be
w = 0.2493 33.7885 -36.0428
for the scaled data set and
w = 0.7610 269.3073 -102.6686
for the unscaled data.
The following image is with scaled data:
The basic question is when should I scale my data? And when to use the scaled or unscaled data set?
scale data -> calc weights using scaled data -> plot using scaled or unscaled data?
or
calc weights using unscaled data -> plot using unscaled data?
I would like the logistic function plot also range from 15000 to 80000 (x) and from 1000 to 5500 (y). When I plot the unscaled version it looks like this, because the logistic function ranges from 0 to 1:
Is there a better command to plot the surface? mesh, trisurf?
Upvotes: 0
Views: 595
Reputation: 1769
It is not necessary to normalise your data before performing a linear regression, linear classification or logistic regression - although it won't do any harm, the final results should be unchanged by linear transformations.
I don't think you need to plot a surface at all. You have two-dimensional data f(x,y)
, so to separate the two you would need a line.
Upvotes: 1