Apply custom function to a column in data frame if the column value is not equal to nan

Question

Maybe Python geniuses can help me here- So I have a machine learning algorithm where I am obtaining my classifier like :

clf = MultinomialNB().fit(X_train_tfidf, y_train)

To get it to predict I simply write

print(clf.predict(count_vect.transform(["I am happy today"])))

and it gives me back the Category in which this sentence falls. Now I want to use this clf to predict results for me for a column. I have another data frame dfnew which consists of 2 columns, and the second column with the header 'Additional Information' has the strings that I need to pass to clf.predict. But this columns has blank values as in the figure below How do I pass this 'additional Information' column to clf.predict() such that it skips the nan ones and only predicts for strings present. At best, I would want the results to be added in the same data frame dfnew in the third column.

Any help or guidance would be really appreciated. Thank you

Ilyas Moutawwakil · Accepted Answer

I guess what you're looking for is the apply method. Simply:

def classify_when_possible(add_info):
    if add_info is None:
        return None
    else:
        return clf.predict(count_vect.transform([add_info]))

#if you want it to be added to your dataframe
dfnew['third column'] = dfnew.apply(lambda row: classify_when_possible(row['Additional Information']), axis=1)

Note: there's many ways to use apply, some are more optimized depending on use case.

Apply custom function to a column in data frame if the column value is not equal to nan

Answers (1)

Related Questions