Reputation: 45
I'm not sure how to get rid of this error. Below is my example datasets. Is there another step that I'm missing?
Code below:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
models = RandomForestClassifier(n_estimators=100)
np.random.seed(42)
X = re_arrange.drop('Gender',axis=1)
y = re_arrange['Gender']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
models.fit(X_train,y_train)
models.score(X_test, y_test)
Upvotes: 1
Views: 1160
Reputation: 1939
RandomForestClassifier can handle only numerical values in any of its features. As you can see, you have text/object data in almost all your features. So 1st of all: do X.info() to know the data type of your features. If you find 'string' & 'object', encode all those features in numbers using One-Hot-Encoder or LabelEncoding.
Upvotes: 0
Reputation: 2311
Your column "Branch" has letters whereas the RandomForestClassifier expects numbers. I believe it is of categorical type. So you can encode the column "Branch" using some categorical encoding as shown below before you do train test split
X["Branch"] = pd.get_dummies(X["Branch"])
It will map letters 'A', 'B' etc in numbers. It does not change your data but just converts them in computational-friendly state
Upvotes: 2