Reputation:
I have a python dataframe (called df) that looks like this when printed in console:
date 2019-09-03 00:00:00 ... OverallAtt
students ...
5c48943cbe8e95292564e163 0.0 ... 78.321678
5c48943dbe8e95292564e165 100.0 ... 87.500000
5c48943dbe8e95292564e166 100.0 ... 86.713287
5c48943dbe8e95292564e167 100.0 ... 95.804196
5c48943dbe8e95292564e169 100.0 ... 100.000000
5c48943dbe8e95292564e16b 100.0 ... 98.601399
5c48943dbe8e95292564e16d 100.0 ... 85.314685
5c48943dbe8e95292564e173 100.0 ... 96.503497
5c48943dbe8e95292564e175 100.0 ... 83.216783
However, when I try to select the students column and put it in a separate variable, like this:
Names = df['students']
It comes up with this error:
KeyError: 'students'
Does anyone know why it won't work?
''''''''UPDATE!''''''''''''
That is fixed now, however I am getting another error when I try to print the predicted values. Here is my code:
dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
y = dataset['OverallAtt'] #Total Attendance ThisYear
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
import pickle
filename='Regressor_model.sav'
pickle.dump(regressor, open(filename, 'wb'))
load_lr_model =pickle.load(open(filename, 'rb'))
#PREDICT FROM NEW DATA
dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
ActualAttendance = dataset['OverallAtt']
Names = df.reset_index(drop=False)['students']
NewX_test = (X)
y_load_predit=load_lr_model.predict(NewX_test)
Newdf = pd.DataFrame({'Full Name': Names, 'Actual Attendance': ActualAttendance, 'Predicted Attendance': y_load_predit})
print(Newdf)
I am getting this error:
ValueError: array length 77 does not match index length 459
ActualAttendance and Names are both 382. Y_load_predit is an array of 382 aswell. So not sure why I'm getting this error?
Upvotes: 0
Views: 68
Reputation: 3108
It looks like that students
is your index name. In order to get it, you can reset your index:
Names = df.reset_index(drop=False)['students']
Upvotes: 1