Reputation: 4498
I am working in python pandas
(in a Jupyter
notebook), where I created a Random Forest model for the Titanic data set.
https://www.kaggle.com/c/titanic/data
I read in the test and train data, then I clean it and I add new columns (the same columns to both).
After fitting and re-fitting the model and trying boosts etc; I decide on one model:
X2 = train_data[['Pclass','Sex','Age','richness']]
rfc_model_3 = RandomForestClassifier(n_estimators=200)
%time cross_val_score(rfc_model_3, X2, Y_target).mean()
rfc_model_3.fit(X2, Y_target)
Then I predict, if somebody survived or not
X_test = test_data[['Pclass','Sex','Age','richness']]
predictions = rfc_model_3.predict(X_test)
preds = pd.DataFrame(predictions, columns=['Survived'])
Is there a way for me to add the predictions as a column
into the test file?
Upvotes: 4
Views: 2939
Reputation: 42875
Since
rfc_model_3 = RandomForestClassifier(n_estimators=200)
rfc_model_3.predict(X_test)
returns y : array of shape = [n_samples]
(see docs), you should be able to add the model output directly to X_test
without creating an intermediate DataFrame
:
X_test['survived'] = rfc_model_3.predict(X_test)
If you want the intermediate result anyway, @EdChum's suggestion in the comments would work fine.
Upvotes: 4