msal
msal

Reputation: 13

ValueError: Shape of passed values is (39, 1), indices imply (39, 7)

Heyy I am trying to do a simple logistic regression on my data which is returns (y) versus market indices (x).

import numpy as np

from sklearn import metrics

data = pd.read_excel ('Datafile.xlsx', index_col=0)

#split dataset into features and target variable
col_features = ['Market Beta','Value','Size','High-Yield Spread','Term Spread','Momentum','Bid-Ask Spread']
target=['Return']
x = data[col_features] #features
y = data[target] #target

#split x and y into training and testing datasets
from sklearn.model_selection import train_test_split
x_train, y_train, x_test, y_test = train_test_split (x, y, test_size = 0.25, random_state = 0)
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
y_train = np.argmax(y_train, axis=1)
logreg.fit(x_train, y_train)

y_pred = logreg.predict(x_test)

The error I get is ValueError: Shape of passed values is (39, 1), indices imply (39, 7)

Thank you.

Upvotes: 0

Views: 430

Answers (1)

Poe Dator
Poe Dator

Reputation: 4893

you just confused the order of train_test_split results so x_test and y_train became switched. Proper order should be this:

x_train, x_test, y_train, y_test = train_test_split(x, y, ...

Upvotes: 1

Related Questions