SVM visualization really random and inaccurate

Question

def vec(utterance): 
    embedder = UtteranceEmbedder(utterance)
    word2vec = embedder.as_word2vec()
    bow = embedder.as_bow_vec()
    ret = np.concatenate([word2vec, bow])
    return np.pad(ret, [0, 500-len(ret)], "constant")

op = OptionParser()
op.add_option(
    "-f", "--file", help="path to file containing utterances to visualize",
    action="store", type="string", dest="path"
)

(opt, args) = op.parse_args()
if opt.path is None or (opt.path is not None and len(opt.path)) == 0:
    op.error("path to file containing newline separated utterances must be specified")

vectors = []
with open(opt.path) as f:
    content = f.readlines()
    # you may also want to remove whitespace characters like `
` at the end of each line
    for utterance in [x.strip() for x in content]:
        vectors.append(vec(utterance))


vectors_reduced = TSNE(n_components=2).fit_transform(np.array(vectors))
X=np.array(vectors_reduced)
y=np.array([0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1])
clf = svm.SVC(decision_function_shape='ovo',class_weight="balanced)
clf.fit(X, y)

An utterance will be a phrase, I will tokenize the utterance, extract word2vec vector form Google 300B trained model, append with a bag of words vector and fit the data. Following is my training data: Input.txt

yea
yeah
yaa
say
ok
okay
no
nope
not interested
dont
cant
cannot
not now
not really
not at the moment
no thank you
sorry no
sorry
not active

As you can see its a simple case of opposites, When I plot the points using matplotlib, I get as random as possible not linearly seprable.

What could be the case for such a completely inaccurate occurrence.

SVM visualization really random and inaccurate

Answers (1)

Related Questions