Reputation: 17418
I have a numpy array like this:
[[0. 1. 1. ... 0. 0. 1.]
[0. 0. 0. ... 0. 0. 1.]
[0. 0. 1. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 1.]
[0. 0. 0. ... 0. 0. 1.]
[0. 0. 0. ... 1. 0. 1.]]
I transform it like this to reduce the memory demand:
x_val = x_val.astype(np.int)
resulting in this:
[[0 1 1 ... 0 0 1]
[0 0 0 ... 0 0 1]
[0 0 1 ... 0 0 0]
...
[0 0 0 ... 0 0 1]
[0 0 0 ... 0 0 1]
[0 0 0 ... 1 0 1]]
However, when I do this:
x_val = to_categorical(x_val)
I get:
in to_categorical
categorical = np.zeros((n, num_classes), dtype=np.float32)
MemoryError
Any ideas why? Ultimately, the numpy array contains the labels for a binary classification problem. So far, I have used it as float32
as is in a Keras ANN and it worked fine and I achieved pretty good performance. So is it actually necessary to run to_categorical
?
Upvotes: 1
Views: 1790
Reputation: 33420
You don't need to use to_categorical
since I guess you are doing multi-label classification. To avoid any confusion once and for all(!), let me explain this.
If you are doing binary classification, meaning each sample may belong to only one of two classes e.g. cat vs dog or happy vs sad or positive review vs negative review, then:
[0 1 0 0 1 ... 0]
with shape of (n_samples,)
i.e. each sample has a one (e.g. cat) or zero (e.g. dog) label.sigmoid
(or any other function that outputs a value in range [0,1]).binary_crossentropy
.If you are doing multi-class classification, meaning each sample may belong to only one of many classes e.g. cat vs dog vs lion or happy vs neutral vs sad or positive review vs neutral review vs negative review, then:
[1, 0, 0]
corresponds to cat, [0, 1, 0]
corresponds to dog and [0, 0, 1]
corresponds to lion, which in this case the labels have a shape of (n_samples, n_classes)
; Or they can be integers (i.e. sparse labels), i.e. 1
for cat, 2
for dog and 3
for lion, which in this case the labels have a shape of (n_samples,)
. The to_categorical
function is used to convert sparse labels to one-hot encoded labels, of course if you wish to do so.softmax
.categorical_crossentropy
is used and if they are sparse then sparse_categorical_crossentropy
is used.If you are doing multi-label classification, meaning each sample may belong to zero, one or more than one classes e.g. an image may contain both cat and dog, then:
[[1 0 0 1 ... 0], ..., [0 0 1 0 ... 1]]
with shape of (n_samples, n_classes)
. For example, a label [1 1]
means that the corresponding sample belong to both classes (e.g. cat and dog).sigmoid
since presumably each class is independent of another class.binary_crossentropy
.Upvotes: 9
Reputation: 17418
Ignoring the fact that the application of to_categorical is pointless in my scenario. The following solves the memory issue:
x_val = x_val.astype(np.uint8)
Upvotes: 0