python-3.xneural-networkclassificationmodel-fittingvalueerror

Reputation: 3699

ValueError: Unknown label type: while implementing MLPClassifier

I have dataframe with columns Year, month, day,hour, minute, second, Daily_KWH. I need to predict Daily KWH using neural netowrk. Please let me know how to go about it

      Daily_KWH_System  year  month  day  hour  minute  second
0          4136.900384  2016      9    7     0       0       0
1          3061.657187  2016      9    8     0       0       0
2          4099.614033  2016      9    9     0       0       0
3          3922.490275  2016      9   10     0       0       0
4          3957.128982  2016      9   11     0       0       0

I'm getting the Value Error, when I'm fitting the model.

code so far:

X = df[['year','month','day','hour','minute','second']]
y = df['Daily_KWH_System']

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

#y_train.shape
#X_train.shape

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

#y_train = np.asarray(df['Daily_KWH_System'], dtype="|S6") 

mlp.fit(X_train,y_train)

Error:

ValueError: Unknown label type: (array([  2.27016856e+02,   3.02173014e+03,   4.29404190e+03,
     2.41273427e+02,   1.76714247e+02,   4.23374425e+03,

Upvotes: 11

Answers (4)

Aditya Vikram Singh

Reputation: 21

Use a regressor instead. This will solve float 2D data issue.

from sklearn.neural_network import MLPRegressor   
model = MLPRegressor(solver='lbfgs',alpha=0.001,hidden_layer_sizes=(10,10))

model.fit(x_train,y_train)

y_pred = model.predict(x_test)

Upvotes: 2

Miriam Farber

Reputation: 19664

First of all, this is a regression problem and not a classification problem, as the values in the Daily_KWH_System column do not form a set of labels. Instead, they seem to be (at least based on the provided example) real numbers.

If you want to approach it as a classification problem regardless, then according to sklearn documentation:

When doing classification in scikit-learn, y is a vector of integers or strings.

In your case, y is a vector of floats, and therefore you get the error. Thus, instead of the line

y = df['Daily_KWH_System']

write the line

y = np.asarray(df['Daily_KWH_System'], dtype="|S6")

and this will resolve the issue. (You can read more about this approach here: Python RandomForest - Unknown label Error)

Yet, as regression is more appropriate in this case, then instead of the above change, replace the lines

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

with

from sklearn.neural_network import MLPRegressor
mlp = MLPRegressor(hidden_layer_sizes=(30,30,30))

The code will run without throwing an error (but there certainly isn't enough data to check whether the model that we get performs well).

With that being said, I don't think that this is the right approach for choosing features for this problem.

In this problem we deal with a sequence of real numbers that form a time series. One reasonable feature that we could choose is the number of seconds (or minutes\hours\days etc) that passed since the starting point. Since this particular data contains only days, months and years (other values are always 0), we could choose as a feature the number of days that passed since the beginning. Then your data frame will look like:

      Daily_KWH_System  days_passed 
0          4136.900384    0   
1          3061.657187    1     
2          4099.614033    2  
3          3922.490275    3   
4          3957.128982    4

You could take the values in the column days_passed as features and the values in Daily_KWH_System as targets. You may also add some indicator features. For example, if you think that the end of the year may affect the target, you can add an indicator feature that indicates whether the month is December or not.

If the data is indeed daily (at least in this example you have one data point per day) and you want to tackle this problem with neural networks, then another reasonable approach would be to handle it as a time series and try to fit recurrent neural network. Here are couple of great blog posts that describe this approach:

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

Upvotes: 14

Chandra

Reputation: 106

Instead of mlp.fit(X_train,y_train) use this mlp.fit(X_train,y_train.values)

Upvotes: -1

zeebonk

Reputation: 5044

The fit() function expects y to be 1D list. By slicing a Pandas dataframe you always get a 2D object. This means that for your case, you need to convert the 2D object you got from slicing the DataFrame into an actual 1D list, as expected by fit function:

y = list(df['Daily_KWH_System'])

Upvotes: 2

ValueError: Unknown label type: while implementing MLPClassifier

Answers (4)

Related Questions