esemve
esemve

Reputation: 360

Python3 text labeling

I dont know, where to start for this question, because i learn now the neural networks. I have a big database with sentence > label pairs. For example:

i want take a photo < photo
i go to take a photo < photo
i go to use my camera < photo
i go to eat something < eat
i like my food < eat

If the user write a new sentence, i want check all label accurancy score:

"I go to bed, after i use my camera" < photo: 0.9000 , eat: 0.4000, ...

So the question, where can I start? Tensorflow and scikit learn is looks good, but this documents classificationt dont show the accuracy :\

Upvotes: 2

Views: 68

Answers (1)

Abhishek Thakur
Abhishek Thakur

Reputation: 17015

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics

sentences = ["i want take a photo", "i go to take a photo", "i go to use my camera", "i go to eat something", "i like my food"]

labels = ["photo", "photo", "photo", "eat", "eat"]

tfv = TfidfVectorizer()

# Fit TFIDF
tfv.fit(traindata)
X =  tfv.transform(traindata) 

lbl = LabelEncoder()
y = lbl.fit_transform(labels)

xtrain, xtest, ytrain, ytest = cross_validation.train_test_split(X, y, stratify=y, random_state=42)

clf = LogisitcRegression()
clf.fit(xtrain, ytrain)
predictions = clf.predict(xtest)

print "Accuracy Score = ", metrics.accuracy_score(ytest, predictions)

for new data:

new_sentence = ["this is a new sentence"]
X_Test = tfv.transform(new_sentence)
print clf.predict_proba(X_Test)

Upvotes: 1

Related Questions