Create a data frame with predicted values, real values and original features

Question

I have the following dataset:

input_data = pd.DataFrame([['This is the news', 0], ['This is the news', 0], ['This is not the news', 1], ['This is not the news', 1], ['This is not the news', 1], ['This is not the news', 1]], columns=('feature1', 'Tag'))

That I want to turn into a TF-IDF matrix using the following function

def TfifdMatrix(inputSet):
    vectorizer = CountVectorizer()
    vectorizer.fit_transform(inputSet)
    print("fit transform done")
    smatrix = vectorizer.transform(inputSet)
    print("transform done")
    smatrix = smatrix.todense()
    tfidf = TfidfTransformer(norm="l2")
    tfidf.fit(smatrix)
    tf_idf_matrix = tfidf.transform(smatrix)
    print("transformation done")
    TfidfMatrix = pd.DataFrame(tf_idf_matrix.todense())
    return (TfidfMatrix)

Now I transform the data and add the tag

input_data2 = TfifdMatrix(input_data['feature1'])
input_data = pd.concat([input_data, input_data2], axis=1)

Create a training- and testset

train = input_data.sample(frac=0.8, random_state=1)
test = input_data.loc[~input_data.index.isin(train.index)]

train_outcome = train['Tag'].values
train_features = train.drop('Tag', axis=1)
test_outcome = test['Tag'].values
test_features = test.drop('Tag', axis=1)

test_features2 = test['Tag']

I not I train a decision tree algorith on it

my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(train_features.drop('feature1', axis=1), train_outcome)
my_dt_prediction = my_tree_one.predict(test_features.drop('feature1', axis=1))

Now I combine everyhting to get an overview of the original features, the real outcome, the predicted outcome and the TF-IDF matrix

df_final = pd.DataFrame(test_features, test_outcome)
df_final['Prediction'] = my_dt_prediction

This however gives me the follwoing data:

  feature1   0   1   2   3   4  Prediction
  1      NaN NaN NaN NaN NaN NaN           1

Any thoughts on where this goes wrong?

Create a data frame with predicted values, real values and original features

Answers (1)

Related Questions