Reputation: 1482
I'm trying to pass a feature
and label
numpy array into train_test_split
. The features are a single column (datetime dtype converted to integer). There are 900 observations in the labels
array.
features.shape
returns (1101, 1)
labels.shape
returns (1101, 900)
Before splitting into feature and label arrays I did df.fillna(0, inplace=True)
because I had thought NaN
values were the issue originally.
Here is the block I'm running:
my_tpot = TPOTRegressor()
X_train, X_test, y_train, y_test = train_test_split(pd.np.array(features), pd.np.array(labels),train_size=0.75, test_size=0.25)
tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
The exceptions occures on the train_test_split
line. Here is the exception:
ValueError: Error: Input data is not in a valid format. Please confirm that the input data is scikit-learn compatible. For example, the features must be a 2-D array and target labels must be a 1-D array.
What is causing this?
Upvotes: 1
Views: 929
Reputation: 1482
Turns out TPOT cannot solve multi label regression problems at this time, that was my problem passing in a a label size of (101, 900)
isn't going to work. If this is reduced to a single column the code works fine.
Upvotes: 3