Chris Macaluso
Chris Macaluso

Reputation: 1482

How to correct Numpy and TPOT array shapes error?

I'm trying to pass a feature and label numpy array into train_test_split. The features are a single column (datetime dtype converted to integer). There are 900 observations in the labels array.

features.shape returns (1101, 1)

labels.shape returns (1101, 900)

Before splitting into feature and label arrays I did df.fillna(0, inplace=True) because I had thought NaN values were the issue originally.

Here is the block I'm running:

my_tpot = TPOTRegressor()
X_train, X_test, y_train, y_test = train_test_split(pd.np.array(features), pd.np.array(labels),train_size=0.75, test_size=0.25)
tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)

The exceptions occures on the train_test_split line. Here is the exception:

ValueError: Error: Input data is not in a valid format. Please confirm that the input data is scikit-learn compatible. For example, the features must be a 2-D array and target labels must be a 1-D array.

What is causing this?

Upvotes: 1

Views: 929

Answers (1)

Chris Macaluso
Chris Macaluso

Reputation: 1482

Turns out TPOT cannot solve multi label regression problems at this time, that was my problem passing in a a label size of (101, 900) isn't going to work. If this is reduced to a single column the code works fine.

Upvotes: 3

Related Questions