Reputation: 197
I am writing a piece of code to identify different 2D shapes using opencv. I get 4 sets of data from each image of a 2D shape and these are stored in the multidimensional array featureVectors.
I am trying to write an svm/svc that takes into account all 4 features obtained from the image. I have been able to make it work with just 2 features but when i try all 4 my graph comes out looking like this.
My values for featureVectors are:
[[ 4.00000000e+00 1.74371349e-03 6.49705560e-01 9.07957236e+01]
[ 4.00000000e+00 4.60937436e-02 1.97642179e-01 9.02041472e+01]
[ 1.00000000e+00 1.18553450e-03 3.03491372e-01 6.03489082e+01]
[ 1.00000000e+00 1.54552898e-02 8.38091425e-01 1.09021207e+02]
[ 3.00000000e+00 1.69961646e-02 4.13691915e+01 1.36838300e+02]]
And my Labels are:
[[2]
[2]
[0]
[0]
[1]]
Here is my code for the SVM:
#Saving featureVectors to a csv file
values1 = featureVectors
header1 = ["Number of Sides", "Standard Deviation of Number of Sides/Perimeter",
"Standard Deviation of the Angles", "Largest Angle"]
my_df = pd.DataFrame(featureVectors)
my_df.to_csv('featureVectors.csv', index=True, header=header1)
#Saving labels to a csv file
values2 = labels
header2 = ["Label"]
my_df = pd.DataFrame(labels)
my_df.to_csv('labels.csv', index=True, header=header2)
#Writing the SVM
def Build_Data_Set(features = header1, features1 = header2):
data_df = pd.DataFrame.from_csv("featureVectors.csv")
#data_df = data_df[:250]
X = np.array(data_df[features].values)
data_df2 = pd.DataFrame.from_csv("labels.csv")
y = np.array(data_df2[features1].values)
#print(X)
#print(y)
return X,y
def Analysis():
X,y = Build_Data_Set()
clf = svm.SVC(kernel = 'linear', C = 1.0)
clf.fit(X, y)
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(0,5)
yy = np.linspace(0,185)
h0 = plt.plot(xx,yy, "k-", label="non weighted")
plt.scatter(X[:, 0],X[:, 1],c=y)
plt.ylabel("Maximum Angle (Degrees)")
plt.xlabel("Number Of Sides")
plt.title('Shapes')
plt.legend()
plt.show()
Analysis()
I have only used 5 data sets(shapes) so far because I knew it wasn't working correctly.
Upvotes: 2
Views: 15438
Reputation: 1551
The SVM part of your code is actually correct. The plotting part around it is not, and given the code I'll try to give you some pointers.
First of all:
another example I found(i cant find the link again) said to do that
Copying code without understanding it will probably cause more problems than it solves. Given your code, I'm assuming you used this example as a starter.
plt.scatter(X[:, 0],X[:, 1],c=y)
In the sk-learn example, this snippet is used to plot data points, coloring them according to their label. This works because in the example we're dealing with 2-dimensional data, so this is fine. The data you're dealing with is 4-dimensional, so you're actually just plotting the first two dimensions.
plt.scatter(X[:, 0], y, c=y)
on the other hand makes no sense.
xx = np.linspace(0,5)
yy = np.linspace(0,185)
h0 = plt.plot(xx,yy, "k-", label="non weighted")
Your decision boundary has actually nothing to do with the actual decision boundary. It's just a plot of y over x of your coordinate system. (In addition to that, you're dealing with multi class data, so you'll have as much decision boundaries as you have classes.)
Now your actual problem is data dimensionality. You're trying to plot 4-dimensional data in a 2d plot, which simply won't work. A possible approach would be to perform dimensionality reduction to map your 4d data into a lower dimensional space, so if you want to, I'd suggest you reading e.g. the excellent sklearn documentation for an introduction to SVMs and in addition something about dimensionality reduction.
Upvotes: 2