Merging results from Prediction to Original Data frame?

Question

I have completed a machine learning algorithm that classifies categories from text. I am 99 percent done however i do now know to to merge my prediction results back to the original dataframe to see a print view of what i started with and what the prediction was.

here is my code below.

#imports data from excel file and shows first 5 rows of data
file_name = r'C:\Users\aac1928\Documents\Machine Learning\Training        Data\RFP Training Data.xlsx'
sheet = 'Sheet1'

import pandas as pd
import numpy
import xlsxwriter
import sklearn

df = pd.read_excel(io=file_name,sheet_name=sheet)

#extracts specifics rows from data 
data = df.iloc[: , [0,2]]
print(data)

#Gets data ready for model
newdata = df.iloc[:,[1,2]]
newdata = newdata.rename(columns={'Label':'label'})
newdata = newdata.rename(columns={'RFP Question':'question'})
print(newdata)

# how to define X and yfor use with COUNTVECTORIZER
X = newdata.question
y = newdata.label
print(X.shape)
print(y.shape)

# split X and y into training and testing sets
X_train = X
y_train = y
X_test = newdata.question[:50]
y_test = newdata.label[:50]
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

# import and instantiate CountVectorizer (with the default parameters)
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()

# equivalently: combine fit and transform into a single step
X_train_dtm = vect.fit_transform(X_train)

# transform testing data (using fitted vocabulary) into a document-term matrix
X_test_dtm = vect.transform(X_test)
X_test_dtm

# import and instantiate a logistic regression model
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()

# train the model using X_train_dtm
%time logreg.fit(X_train_dtm, y_train)

# make class predictions for X_test_dtm
y_pred_class = logreg.predict(X_test_dtm)
y_pred_class

# calculate predicted probabilities for X_test_dtm (well calibrated)
y_pred_prob = logreg.predict_proba(X_test_dtm)[:, 1]
y_pred_prob

# calculate accuracy
metrics.accuracy_score(y_test, y_pred_class)

this is my new data added to make predictions from with the same length as the array

# split X and y into training and testing sets
X_train = X
y_train = y
X_testnew = dfpred.question
y_testnew = dfpred.label
print(X_train.shape)
print(X_testnew.shape)
print(y_train.shape)
print(y_testnew.shape)

(447,) (168,) (447,) (168,)

# transform new testing data (using fitted vocabulary) into a document-term matrix
X_test_dtm_new = vect.transform(X_testnew)
X_test_dtm_new

<168x1382 sparse matrix of type '' with 2240 stored elements in Compressed Sparse Row format>

# make class predictions for new X_test_dtm
y_pred_class_new = nb.predict(X_test_dtm_new)
y_pred_class_new

array([ 3, 3, 19, 18, 5, 10, 10, 5, 19, 3, 3, 3, 5, 3, 3, 3, 3, 9, 19, 5, 5, 10, 9, 5, 18, 19, 9, 9, 19, 19, 18, 18, 18, 4, 18, 3, 9, 18, 19, 19, 18, 19, 5, 19, 19, 3, 3, 18, 18, 5, 18, 3, 4, 5, 6, 4, 5, 19, 19, 5, 5, 19, 19, 4, 5, 18, 5, 5, 19, 5, 18, 5, 19, 18, 19, 5, 7, 5, 9, 9, 9, 9, 10, 9, 9, 5, 5, 5, 5, 3, 18, 4, 9, 5, 3, 6, 9, 18, 7, 5, 9, 5, 5, 19, 5, 5, 19, 5, 6, 5, 5, 6, 9, 21, 10, 9, 18, 9, 9, 3, 18, 5, 6, 18, 6, 3, 6, 5, 18, 6, 5, 18, 5, 6, 7, 7, 5, 7, 19, 18, 6, 5, 5, 5, 5, 5, 19, 16, 5, 19, 5, 5, 5, 5, 19, 5, 7, 19, 6, 7, 3, 18, 18, 18, 6, 19, 19, 7], dtype=int64)

# calculate predicted probabilities for X_test_dtm (well calibrated)
y_pred_prob_new = logreg.predict_proba(X_test_dtm_new)[:, 1]
y_pred_prob_new

df['prediction'] = pd.Series(y_pred_class_new)

dfout = pd.merge(dfpred,df['prediction'].dropna() .to_frame(),how = 'left',left_index = True,   right_index = True)

print(dfout)

I hope this helps I am trying to be as clear as possible

sacuL · Accepted Answer

I think since your predictions are just an array you'll be better off just using:

df['predictions'] = y_pred_class

Merging results from Prediction to Original Data frame?

here is my code below.

this is my new data added to make predictions from with the same length as the array

Answers (2)

Related Questions