Cross validation in random forest using anaconda

Question

I'm using the titanic data set to predict if a passenger survived or not using random forest. This is my code:

import numpy as np 
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier
from sklearn import cross_validation
import matplotlib.pyplot as plt
%matplotlib inline

data=pd.read_csv("C:\Users\kabala\Downloads\Titanic.csv")
data.isnull().any()
data["Age"]=data1["Age"].fillna(data1["Age"].median())
data["PClass"]=data["PClass"].fillna("3rd")
data["PClass"].isnull().any()
data1.isnull().any()
pd.get_dummies(data.Sex)
# choosing the predictive variables 
x=data[["PClass","Age","Sex"]]
# the target variable is y 
y=data["Survived"]
modelrandom=RandomForestClassifier(max_depth=3)
modelrandom=cross_validation.cross_val_score(modelrandom,x,y,cv=5)

But, I keep on getting this error:

ValueError: could not convert string to float: 'female'

and I don't understand what is the problem because I changed the Sex feature to a dummy

Thanks:)

Cross validation in random forest using anaconda

Answers (1)

Related Questions