Reputation: 33
Maybe Python geniuses can help me here- So I have a machine learning algorithm where I am obtaining my classifier like :
clf = MultinomialNB().fit(X_train_tfidf, y_train)
To get it to predict I simply write
print(clf.predict(count_vect.transform(["I am happy today"])))
and it gives me back the Category in which this sentence falls.
Now I want to use this clf to predict results for me for a column. I have another data frame dfnew which consists of 2 columns, and the second column with the header 'Additional Information' has the strings that I need to pass to clf.predict.
But this columns has blank values as in the figure below
How do I pass this 'additional Information' column to clf.predict() such that it skips the nan ones and only predicts for strings present. At best, I would want the results to be added in the same data frame dfnew in the third column.
Any help or guidance would be really appreciated. Thank you
Upvotes: 0
Views: 42
Reputation: 109
I guess what you're looking for is the apply
method. Simply:
def classify_when_possible(add_info):
if add_info is None:
return None
else:
return clf.predict(count_vect.transform([add_info]))
#if you want it to be added to your dataframe
dfnew['third column'] = dfnew.apply(lambda row: classify_when_possible(row['Additional Information']), axis=1)
Note: there's many ways to use apply, some are more optimized depending on use case.
Upvotes: 1