dataset preprocessing for python AI

Question

I am using a keras library to preprocess my data after the initial step of predefining the photos into folder with their classification. I didn't want to manualy do it so I made my own script, but it isn't working, could somebody help debug it? It doesn't give a specific error, but it just doesn't finish doing the job and stops at photo ISIC_0000006. wah is if the photo is classified cancerous, yay is if the photo is classified benign cancer. The dataset returns a 1 if it is bad, and 0 if it is okay. I still don't know the problem.
This is the dataset I am using.

By the way, I am only a kid still so please don't expect me to know too much about programming.

Sample lines from dataset:

ISIC_0000000 = 0
ISIC_0000001 = 0
ISIC_0000002 = 1
ISIC_0000003 = 0
ISIC_0000004 = 0
ISIC_0000005 = 1

My code:

import pandas as pd
import os
dataset = pd.read_csv('ISIC-2017_Training_Part3_GroundTruth.csv')
dataset = dataset.iloc[:, :-1]
x = 0
xb = 0
xm = 0
prevName = 'ISIC_0000000.jpg'
newName = 'yay/benign1'
while(x <= 1500):
    x = x + 1
    if prevName == dataset.iloc[x-1, 0] + '.jpg':
        if x < 10:
            prevName = 'ISIC_000000' + str(x-1) + '.jpg'
            if prevName == 'ISIC_0000005.jpg': #dataset has random hole so skips over
                x = x + 1
                prevName = 'ISIC_0000006.jpg'
        elif x < 100: 
            prevName = 'ISIC_00000' + str(x-1) + '.jpg'
        elif x < 1000:
            prevName = 'ISIC_0000' + str(x-1) + '.jpg'
        else:
            prevName = 'ISIC_000' + str(x-1) + '.jpg'
        if dataset.iloc[x-1, 1] == 1:
            xm = xm + 1
            newName = 'melanoma' + str(xm)
        else:
            xb = xb + 1
            newName = 'benign' +  str(xb)
        if newName == 'benign' +  str(xb):
            newName = 'yay/' + newName + '.jpg'
            os.rename(prevName, newName)
        else:
            newName = 'wah/' + newName + '.jpg'
            os.rename(prevName, newName)
        prevName = 'ISIC_000000' + str(x+1) + '.jpg'

EDIT!!! This is my new code thanks to Abhineet Gupta that is able to go further through the dataset, but oddly stops at photo 34:

import pandas as pd
import os
dataset = pd.read_csv('_ISIC-2017_Training_Part3_GroundTruth.csv')
dataset = dataset.iloc[:, :-1]
x = 0
xb = 0
xm = 0
prevName = 'ISIC_0000000.jpg'
newName = 'yay/benign1'
while(x <= 1500):
    x = x + 1
    prevName = 'ISIC_' +  str(x).zfill(7) + '.jpg'
    if prevName == dataset.iloc[x-1, 0] + '.jpg':
        if x == '0000005':
            x = x + 1
            prevName = 'ISIC_000006.jpg'
        if dataset.iloc[x-1, 1] == 1:
            xm = xm + 1
            newName = 'melanoma' + str(xm)
        else:
            xb = xb + 1
            newName = 'benign' +  str(xb)
        if newName == 'benign' +  str(xb):
            newName = 'yay/' + newName + '.jpg'
            os.rename(prevName, newName)
        else:
            newName = 'wah/' + newName + '.jpg'
            os.rename(prevName, newName)
        prevName = 'ISIC_000000' + str(x+1) + '.jpg'

Last edit: it turns out it wasn't the codes fault just the .csv file was messed up. Thanks Abhineet Gupta and mrk for the solutions!!!

Abhineet Gupta · Accepted Answer

Based on the above code, the error seems to occur in the following code section -

11:     x = x + 1
12:     if prevName == dataset.iloc[x-1, 0] + '.jpg':
13:         if x < 10:
14:             prevName = 'ISIC_000000' + str(x-1) + '.jpg'
15:             if prevName == 'ISIC_0000005.jpg':
16:                 x = x + 1
17:                 prevName = 'ISIC_0000006.jpg'
...
36:         prevName = 'ISIC_000000' + str(x+1) + '.jpg'

So, if x == 5 and prevName == 'ISIC_0000005.jpg',

Line 11 assigns x -> 6,

Line 12 and 13 are true,

Line 14 assigns prevName -> 'ISIC_0000005.jpg'

Line 15 is true,

Line 16 and 17 assign x -> 7 and prevName -> 'ISIC_0000006.jpg'

Then, Line 36 (last line) that is outside the if statement will assign prevName -> 'ISIC_0000008.jpg'

When the loop restarts, Line 11 assigns x -> 8,

Line 12 is false and program continues until x > 1500 without entering the if block.

To fix the code, I recommend using str(x).zfill(7) which pads the integer with leading zeros, e.g, for x = 5 returns '0000005' and for x = 95 returns '0000095'. This would eliminate the need for specifying leading zeros based on number of digits in x, and simplify your code.

dataset preprocessing for python AI

Answers (2)

Related Questions