Learn from given data and apply it on new data

Question

I'm a beginner to machine learning and scikit-learn so this might be a stupid question..

I'm trying to do something like this:

features = [['adam'], ['james'], ['amy']]
labels = ['hello adam', 'hello james', 'hello amy']

clf = clf.fit(features, labels)

print clf.predict(['john'])
# This should give out 'hello john'

Is this possible using scikit-learn?

Thanks in advance!

elyase · Accepted Answer

The principled way to solve this would be to do sequence to sequence learning which is a more complicated beast and outside of scikit-learn's scope.

With enough feature engineering and correct problem formulation you can still help a simpler algorithm like the ones in scikit learn achieve this task. There are two main difficulties that need to be tackled:

how to convert your features and your labels into a numeric representation (one-hot, embeddings, ...)
how to encode a variable length sequence into a fixed length vector that can be feed to scikit-learn algorithms (bag of word, mean pooling, rnn).

Learn from given data and apply it on new data

Answers (1)

Related Questions