Ahmad Anis
Ahmad Anis

Reputation: 2724

ValueError: y should be a 1d array, got an array of shape instead

I know this is asked many times, but I can not figure it out.

I have dataset in this format. First 767 columns are for training and have training data. Next 669 columns are labels.

Labels are in the format of one hot vector i.e [0,0,0......1,0,0]. So I have 669 columns. Now I want to perform training on it using XGBoost. My code is.

self.clf = XGBClassifier(objective="multi:softmax", num_classes=669)
data = single_data.iloc[:, 0:767]
label = single_data.iloc[:, 767:]
self.clf.fit(data, label)

The error I get is

ValueError: y should be a 1d array, got an array of shape (1638, 670) instead.

How can I solve this? Thanks

Upvotes: 1

Views: 8693

Answers (1)

Chris
Chris

Reputation: 16182

I'm assuming your data looks like this:

import pandas as pd
label = pd.DataFrame({'c0':[0,1,0,0,0], 'c1':[1,0,0,0,0], 'c2':[0,0,1,1,0], 'c3':[0,0,0,0,1]})
print(label)

Output

   c0  c1  c2  c3
0   0   1   0   0
1   1   0   0   0
2   0   0   1   0
3   0   0   1   0
4   0   0   0   1

The convert them to integers

label = label.apply(lambda x: x.argmax(), axis=1).values

Now your labels look like this, a single array:

array([1, 0, 2, 2, 3], dtype=int64)

Upvotes: 2

Related Questions