Ach113
Ach113

Reputation: 1825

ValueError: setting an array element with a sequence in scikit-learn (sklearn) using GaussianNB

I am trying to make a sklearn image classifier but I am unable to fit the data into a classifier.

x_train = np.array(im_matrix)
y_train = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
clf = GaussianNB()
clf.fit(x_train, y_train)

at clf.fit(x_train, y_train) I get following error:

ValueError: setting an array element with a sequence.

im_matrix is an array holding image matrices:

for file in files:
        path = os.path.join(root, file)
        im_matrix.append(mpimg.imread(path))

shape of x_train is (10, 1) shape of y_train is (10,)

I am guessing the problem is with the x_train as its weirdly shaped:

array([array([[[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [175, 140, 160],
        [175, 140, 160],
        [175, 140, 160]],

       [[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [174, 139, 159],
        [174, 139, 159],
        [174, 139, 159]],

       [[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [173, 138, 158],
        [173, 138, 158],
        [173, 138, 158]],

       ...,

       [[199, 222, 253],
        [121, 142, 169],
        [ 13,  34,  55],
        ...,
        [ 31,  40,  49],
        [ 31,  40,  49],
        [ 32,  41,  50]],

       [[187, 206, 246],
        [ 80,  98, 134],
        [  0,  13,  41],
        ...,
        [ 36,  44,  63],
        [ 35,  43,  62],
        [ 35,  43,  62]],

       [[187, 206, 246],
        [ 80,  98, 134],
        [  0,  13,  41],
        ...,
        [ 36,  44,  63],
        [ 35,  43,  62],
        [ 35,  43,  62]]], dtype=uint8),

This has been asked here several times, but I could not find a solution. Any help would be appreciated

Upvotes: 2

Views: 3634

Answers (1)

seralouk
seralouk

Reputation: 33127

Most (if not all) scikit-learn functions expect as input X, a 2D array with shape (n_samples, n_features).

See the doc: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB.fit

Fit Gaussian Naive Bayes according to X, y

Parameters: X : array-like, shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

To solve your problem, use a vector representation of the images and then put each vector as a row in your x_train matrix.

Finally, use this X for the fitting of the GaussianNB.


How to vectorize an image ?

Use something like this:

import numpy as np
from PIL import Image

img = Image.open('orig.png').convert('RGBA')
arr = np.array(img)


# make a 1-dimensional view of arr
flat_arr = arr.ravel()

Upvotes: 2

Related Questions