Reputation: 13
Heyy I am trying to do a simple logistic regression on my data which is returns (y) versus market indices (x).
import numpy as np
from sklearn import metrics
data = pd.read_excel ('Datafile.xlsx', index_col=0)
#split dataset into features and target variable
col_features = ['Market Beta','Value','Size','High-Yield Spread','Term Spread','Momentum','Bid-Ask Spread']
target=['Return']
x = data[col_features] #features
y = data[target] #target
#split x and y into training and testing datasets
from sklearn.model_selection import train_test_split
x_train, y_train, x_test, y_test = train_test_split (x, y, test_size = 0.25, random_state = 0)
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
y_train = np.argmax(y_train, axis=1)
logreg.fit(x_train, y_train)
y_pred = logreg.predict(x_test)
The error I get is ValueError: Shape of passed values is (39, 1), indices imply (39, 7)
Thank you.
Upvotes: 0
Views: 430
Reputation: 4893
you just confused the order of train_test_split results so x_test
and y_train
became switched. Proper order should be this:
x_train, x_test, y_train, y_test = train_test_split(x, y, ...
Upvotes: 1