Reputation: 515
def vec(utterance):
embedder = UtteranceEmbedder(utterance)
word2vec = embedder.as_word2vec()
bow = embedder.as_bow_vec()
ret = np.concatenate([word2vec, bow])
return np.pad(ret, [0, 500-len(ret)], "constant")
op = OptionParser()
op.add_option(
"-f", "--file", help="path to file containing utterances to visualize",
action="store", type="string", dest="path"
)
(opt, args) = op.parse_args()
if opt.path is None or (opt.path is not None and len(opt.path)) == 0:
op.error("path to file containing newline separated utterances must be specified")
vectors = []
with open(opt.path) as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
for utterance in [x.strip() for x in content]:
vectors.append(vec(utterance))
vectors_reduced = TSNE(n_components=2).fit_transform(np.array(vectors))
X=np.array(vectors_reduced)
y=np.array([0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1])
clf = svm.SVC(decision_function_shape='ovo',class_weight="balanced)
clf.fit(X, y)
An utterance will be a phrase, I will tokenize the utterance, extract word2vec vector form Google 300B trained model, append with a bag of words vector and fit the data. Following is my training data: Input.txt
yea
yeah
yaa
say
ok
okay
no
nope
not interested
dont
cant
cannot
not now
not really
not at the moment
no thank you
sorry no
sorry
not active
As you can see its a simple case of opposites, When I plot the points using matplotlib, I get as random as possible not linearly seprable.
What could be the case for such a completely inaccurate occurrence.
Upvotes: 0
Views: 37
Reputation: 3082
You are using a svc with the standard parameters. Especialy your Kernel is an rbf kernel (also known as gaußian). How SVM's work a litle bit too complicated to post here and with a kernel it's even more complex. If you are intrested i can recommend you the lecture from MIT.
https://www.youtube.com/watch?v=_PwhiWxHK8o
But in short your gaußian kernel does the separation still linear but in a higer dimensional vectorspace. It transforms your data for the separation into that space and separates it linearily with a hyperplane. If you visualize your data later with your support vectors in two dimensions the separation is not linar but in your "kernel-vector-space" it is.
Btw. you should also think about standardization.
Upvotes: 0