JJL
JJL

Reputation: 53

XGBoost Python error: "Size of labels must equal to number of rows"

I am using xgboost in Python.

import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split

df=pd.read_csv('442.csv')
y=df.columnone
X=df.columnfive

X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.2)
dtrain = xgb.DMatrix(X_train, label=Y_train)
dtest = xgb.DMatrix(X_test, label=Y_test)

The shape of the label seem to be uniform with the training set?

X_train.shape
>(405020,)
Y_train.shape
>(405020,)

param = {
   'eta': 0.3,
   'max_depth': 3, 
   'objective': 'multi:softprob', 
   'num_class': 2}
steps = 20  # The number of training iterations

But running this gives me this result:

model = xgb.train(param, dtrain, steps)
>XGBoostError: Check failed: labels_.Size() == num_row_ (405020 vs. 1) : Size of labels must equal to number of rows.

When I run

dtrain.num_row()
>1
dtrain.num_col()
>405020

This might have to do with the error? But I still have no idea how that could have happened. My initial X and y variables both have the correct number of rows and one column each.

Upvotes: 4

Views: 8824

Answers (3)

Kevin
Kevin

Reputation: 4293

For me this was an error based on the subset of my rows I was trying to test more quickly - this subset didn't have at least one of each label, so it was giving this error.

Upvotes: 0

sa-
sa-

Reputation: 1

I got the same error, but it was because I had a bug in my code

x = trainset[feature_cols + dummy_var_cols]
y = testset[[label_column]]

dtrain = xgb.DMatrix(x_with_dummies, y)

Can you spot it? My y variable was coming from my test set and not my train set!

Upvotes: 0

Igor Rivin
Igor Rivin

Reputation: 4864

Xgboost expects a 2-d array of inputs, and a vector of outputs. You are giving it two vectors, so it is confused. using df[["columnone"]] for the input should work.

Upvotes: 3

Related Questions