Reputation: 31
I've looked at a couple of other posts with this issue, and I cannot figure out what I'm getting wrong here.
I have X_data, and Y_data, and they both have the shape (200000,6). Sample data output from them looks like this:
X_data:
(200000, 6)
[[ 0.00237987 0.00237987 -0.00075756 -0.00221595 -0.00368199 0.00019625]
[ 0.00171481 0.00171481 0.00176989 0.00125255 0.00275689 -0.00111833]
[ 0.00190234 0.00190234 0.00333571 0.00127516 0.00146631 -0.00240469]
...
[ 0.00211437 0.00211437 0.00221987 0.0002214 0.00273094 -0.00114419]
[ 0.00185682 0.00185682 0.00352099 0.00064055 -0.00051575 0.00335213]
[ 0.00155133 0.00155133 -0.00368774 -0.00200935 0.00225988 -0.00161371]]
Y_data:
(200000, 6)
[[1. 0.14713856 0.04063819 0.03123633 0.00239176 0.01674091]
[1. 0.35532772 0.09834969 0.19631962 0.0153588 0.10071312]
[1. 0.17015225 0.04700213 0.04208244 0.00322773 0.02244747]
...
[1. 0.14534398 0.04014234 0.03046259 0.0023313 0.01633189]
[1. 0.18606737 0.05138638 0.0368341 0.00281708 0.01979553]
[1. 0.31199003 0.0863072 0.14879644 0.01157114 0.07705023]]
As soon as I do test_train_split, as follows:
ts1 = 0.2
rs1 = 42
X_train, X_test, Y_train, Y_test = train_test_split(X_data, Y_data[0], test_size = ts1, random_state = rs1)
My code crashes with the value error. I have no idea where I'm going wrong.
Upvotes: 0
Views: 116
Reputation: 274
It seems like the first column of your Y_data matrix is the label for your x data, (I'm not sure what the other 5 columns in your Y_train represent). You are currently getting the first row, which isn't correct (note the size is 6 but you would like one y-value for each x input). So the code I think you want is
X_train, X_test, Y_train, Y_test = train_test_split(X_data,\
Y_data[:, 0], test_size = ts1, random_state = rs1)
Upvotes: 1