Reputation: 91
I wouldn't ask you all to help but I've been trying for many hours to figure out what I'm doing wrong and failing miserably. I'm trying to train a neural network on some data I have collected using the scikit-learn library in python.
Website I'm using as reference: http://scikit-learn.org/stable/modules/neural_networks_supervised.html
My data for training_x ends up being an array of arrays which looks similar to this:
[[0.1, 0.2, -0.1], [0.21, -0.32, 0.3]]
for training_y, it's an array of floats which looks like this: [0.3, 0.2]
training_x = []
training_y = []
for day_offset in range(int((end_date - start_date).days) + 1):
curr_day = start_date + timedelta(day_offset)
for company in companies:
output_training_data(cursor, training_x, training_y, company, curr_day)
clf = MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(5, 3), random_state=1)
clf.fit(training_x, training_y)
Then I get the following error:
Traceback (most recent call last):
File "/Users/jodymcadams/Documents/GitHub/moneygen/create_training_data.py", line 194, in <module>
main()
File "/Users/jodymcadams/Documents/GitHub/moneygen/create_training_data.py", line 191, in main
update_data(app_config, companies)
File "/Users/jodymcadams/Documents/GitHub/moneygen/create_training_data.py", line 169, in update_data
update_tweets(app_config, companies)
File "/Users/jodymcadams/Documents/GitHub/moneygen/create_training_data.py", line 154, in update_tweets
process_twitter(cursor, companies)
File "/Users/jodymcadams/Documents/GitHub/moneygen/create_training_data.py", line 136, in process_twitter
clf.fit(training_x, training_y)
File "/usr/local/lib/python2.7/site-packages/sklearn/neural_network/multilayer_perceptron.py", line 618, in fit
return self._fit(X, y, incremental=False)
File "/usr/local/lib/python2.7/site-packages/sklearn/neural_network/multilayer_perceptron.py", line 330, in _fit
X, y = self._validate_input(X, y, incremental)
File "/usr/local/lib/python2.7/site-packages/sklearn/neural_network/multilayer_perceptron.py", line 908, in _validate_input
self._label_binarizer.fit(y)
File "/usr/local/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 304, in fit
self.classes_ = unique_labels(y)
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/multiclass.py", line 98, in unique_labels
raise ValueError("Unknown label type: %s" % repr(ys))
ValueError: Unknown label type: (array([ -8.60708650e-04, -1.63581100e-03, 9.93761387e-04,
3.86313466e-04, 4.85415472e-04, 9.92796708e-05,
-7.66657374e-04, -1.60558464e-03, 2.50678922e-03,
-9.75813759e-04, -1.11646082e-03, -2.30801511e-03,
-1.48148148e-03, -2.47524752e-03, 9.89119683e-04,
-4.94804552e-04, 4.94559842e-04, -9.90099010e-04,
2.72479564e-03, -2.36707939e-03, -3.64298725e-04,
1.36425648e-03, -1.81933958e-04, -5.12023407e-03,
Upvotes: 1
Views: 358
Reputation: 1218
Your labels must be integers. Float labels cannot be unique'd.
Consider "Classification" the task of finding a mapping from inputs to outputs, which is discrete. Consider "Regression" the task of finding a mapping from inputs to outputs which is continuous. Being that your labels are floats, it looks to me like you're trying to do a regression.
If so, consider using MLPRegressor
instead.
Upvotes: 3