dhan004
dhan004

Reputation: 45

How to fix this error: ValueError: could not convert string to float: 'A'

I'm not sure how to get rid of this error. Below is my example datasets. Is there another step that I'm missing?

enter image description here

Code below: 
from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier 
models = RandomForestClassifier(n_estimators=100) 
np.random.seed(42)

X = re_arrange.drop('Gender',axis=1) 
y = re_arrange['Gender']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

models.fit(X_train,y_train)
models.score(X_test, y_test)

Upvotes: 1

Views: 1160

Answers (2)

Mehul Gupta
Mehul Gupta

Reputation: 1939

RandomForestClassifier can handle only numerical values in any of its features. As you can see, you have text/object data in almost all your features. So 1st of all: do X.info() to know the data type of your features. If you find 'string' & 'object', encode all those features in numbers using One-Hot-Encoder or LabelEncoding.

One-Hot-Encoding

LabelEncoding

Upvotes: 0

sam
sam

Reputation: 2311

Your column "Branch" has letters whereas the RandomForestClassifier expects numbers. I believe it is of categorical type. So you can encode the column "Branch" using some categorical encoding as shown below before you do train test split

X["Branch"] = pd.get_dummies(X["Branch"])

It will map letters 'A', 'B' etc in numbers. It does not change your data but just converts them in computational-friendly state

Upvotes: 2

Related Questions