How can I stop my Colab notebook from crashing while normalising my images?

Question

I am trying to make a model which recognises the emotions of a human. My code and RAM is just fine in the start:

But when I try to normalise my images, the RAM drastically jumps up

and then Colab just crashes:

This is the code block which is causing colab to crash:

import os
import matplotlib.pyplot as plt
import cv2

data = []

for emot in os.listdir('./data/'):
    for file_ in os.listdir(f'./data/{emot}'):
        img = cv2.imread(f'./data/{emot}/{file_}', 0)
        img = cv2.bitwise_not(img)
        img /= 255.0 # <--- This is the line that causes colab to crash
        data.append([img, emotions.index(emot)])

If I remove the img /= 255.0, it doesn't crash, but then I have images which are not normalised!:
I even tried normalising it in another block:

for i in range(len(data)):
    data[i][0] = np.array(data[i][0]) / 255.0

but it doesn't work and still crashes

Quwsar Ohi · Accepted Answer

I would like to go through an example. Firstly let's have a look at the following code.

import numpy as np
x = np.random.randint(0, 255, size=(100, 32, 32), dtype=np.int16)

print('Present data type', x.dtype)
# What you did
y = x/255
print('Present data type', y.dtype)
# What you should do
z = (x/255).astype(np.float16)
print('Present data type', z.dtype)

Output:

Present data type int16
Present data type float64
Present data type float16

If you look closely, while I am dividing the x variable and declaring y=x/255, the data type changes to float64. If you divide an int data type of a NumPy array, by default, it is typecasted to float64. Generally, 'float64' contains a larger memory. Therefore while dividing a int type NumPy matrix, one should always typecase to shorter datatypes for larger datasets.

If the code you executed fluently runs without the img /= 255.0 block, then this is the case. After dividing, you should typecast the img variable to the lowest possible float types, such as, np.float16 or np.float32. However, np.float16 has some limitations and it is not fully supported by TensorFlow (TF converts it to 32-bit float), you may use np.float32 datatype.

Therefore, try adding img.astype(np.float16) or img.astype(np.float32) after the line img /= 255.0.

The modified version of the code is given,

import os
import matplotlib.pyplot as plt
import cv2

data = []

for emot in os.listdir('./data/'):
    for file_ in os.listdir(f'./data/{emot}'):
        img = cv2.imread(f'./data/{emot}/{file_}', 0)
        img = cv2.bitwise_not(img)
        img = (img/255.0).astype(np.float16) # <--- This is the suggestion
        data.append([img, emotions.index(emot)])

How can I stop my Colab notebook from crashing while normalising my images?

Answers (2)

Related Questions