user12559668
user12559668

Reputation:

Error Selecting a Column in Python Dataframe

I have a python dataframe (called df) that looks like this when printed in console:

 date                      2019-09-03 00:00:00  ...  OverallAtt
    students                                       ...            
    5c48943cbe8e95292564e163                  0.0  ...   78.321678
    5c48943dbe8e95292564e165                100.0  ...   87.500000
    5c48943dbe8e95292564e166                100.0  ...   86.713287
    5c48943dbe8e95292564e167                100.0  ...   95.804196
    5c48943dbe8e95292564e169                100.0  ...  100.000000
    5c48943dbe8e95292564e16b                100.0  ...   98.601399
    5c48943dbe8e95292564e16d                100.0  ...   85.314685
    5c48943dbe8e95292564e173                100.0  ...   96.503497
    5c48943dbe8e95292564e175                100.0  ...   83.216783

However, when I try to select the students column and put it in a separate variable, like this:

Names = df['students']

It comes up with this error:

KeyError: 'students'

Does anyone know why it won't work?

''''''''UPDATE!''''''''''''

That is fixed now, however I am getting another error when I try to print the predicted values. Here is my code:

dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
y = dataset['OverallAtt'] #Total Attendance ThisYear

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

import pickle
filename='Regressor_model.sav'
pickle.dump(regressor, open(filename, 'wb'))

load_lr_model =pickle.load(open(filename, 'rb'))

#PREDICT FROM NEW DATA
dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
ActualAttendance = dataset['OverallAtt']
Names = df.reset_index(drop=False)['students']

NewX_test = (X)
y_load_predit=load_lr_model.predict(NewX_test)
Newdf = pd.DataFrame({'Full Name': Names, 'Actual Attendance': ActualAttendance, 'Predicted Attendance': y_load_predit})
print(Newdf)

I am getting this error:

ValueError: array length 77 does not match index length 459

ActualAttendance and Names are both 382. Y_load_predit is an array of 382 aswell. So not sure why I'm getting this error?

Upvotes: 0

Views: 68

Answers (1)

ndclt
ndclt

Reputation: 3108

It looks like that students is your index name. In order to get it, you can reset your index:

Names = df.reset_index(drop=False)['students']

Upvotes: 1

Related Questions