Hudson Liu
Hudson Liu

Reputation: 49

dataset preprocessing for python AI

I am using a keras library to preprocess my data after the initial step of predefining the photos into folder with their classification. I didn't want to manualy do it so I made my own script, but it isn't working, could somebody help debug it? It doesn't give a specific error, but it just doesn't finish doing the job and stops at photo ISIC_0000006. wah is if the photo is classified cancerous, yay is if the photo is classified benign cancer. The dataset returns a 1 if it is bad, and 0 if it is okay. I still don't know the problem.
This is the dataset I am using.

By the way, I am only a kid still so please don't expect me to know too much about programming.

Sample lines from dataset:

ISIC_0000000 = 0
ISIC_0000001 = 0
ISIC_0000002 = 1
ISIC_0000003 = 0
ISIC_0000004 = 0
ISIC_0000005 = 1

My code:

import pandas as pd
import os
dataset = pd.read_csv('ISIC-2017_Training_Part3_GroundTruth.csv')
dataset = dataset.iloc[:, :-1]
x = 0
xb = 0
xm = 0
prevName = 'ISIC_0000000.jpg'
newName = 'yay/benign1'
while(x <= 1500):
    x = x + 1
    if prevName == dataset.iloc[x-1, 0] + '.jpg':
        if x < 10:
            prevName = 'ISIC_000000' + str(x-1) + '.jpg'
            if prevName == 'ISIC_0000005.jpg': #dataset has random hole so skips over
                x = x + 1
                prevName = 'ISIC_0000006.jpg'
        elif x < 100: 
            prevName = 'ISIC_00000' + str(x-1) + '.jpg'
        elif x < 1000:
            prevName = 'ISIC_0000' + str(x-1) + '.jpg'
        else:
            prevName = 'ISIC_000' + str(x-1) + '.jpg'
        if dataset.iloc[x-1, 1] == 1:
            xm = xm + 1
            newName = 'melanoma' + str(xm)
        else:
            xb = xb + 1
            newName = 'benign' +  str(xb)
        if newName == 'benign' +  str(xb):
            newName = 'yay/' + newName + '.jpg'
            os.rename(prevName, newName)
        else:
            newName = 'wah/' + newName + '.jpg'
            os.rename(prevName, newName)
        prevName = 'ISIC_000000' + str(x+1) + '.jpg'

EDIT!!! This is my new code thanks to Abhineet Gupta that is able to go further through the dataset, but oddly stops at photo 34:

import pandas as pd
import os
dataset = pd.read_csv('_ISIC-2017_Training_Part3_GroundTruth.csv')
dataset = dataset.iloc[:, :-1]
x = 0
xb = 0
xm = 0
prevName = 'ISIC_0000000.jpg'
newName = 'yay/benign1'
while(x <= 1500):
    x = x + 1
    prevName = 'ISIC_' +  str(x).zfill(7) + '.jpg'
    if prevName == dataset.iloc[x-1, 0] + '.jpg':
        if x == '0000005':
            x = x + 1
            prevName = 'ISIC_000006.jpg'
        if dataset.iloc[x-1, 1] == 1:
            xm = xm + 1
            newName = 'melanoma' + str(xm)
        else:
            xb = xb + 1
            newName = 'benign' +  str(xb)
        if newName == 'benign' +  str(xb):
            newName = 'yay/' + newName + '.jpg'
            os.rename(prevName, newName)
        else:
            newName = 'wah/' + newName + '.jpg'
            os.rename(prevName, newName)
        prevName = 'ISIC_000000' + str(x+1) + '.jpg'

Last edit: it turns out it wasn't the codes fault just the .csv file was messed up. Thanks Abhineet Gupta and mrk for the solutions!!!

Upvotes: 1

Views: 160

Answers (2)

mrk
mrk

Reputation: 10406

Since you're reading the csv file with '=' as a delimiter you have to specify that while loading, at least that was the error I experienced when trying to run your code.

Try changing your line to:

dataset = pd.read_csv('ISIC-2017_Training_Part3_GroundTruth.csv', sep = '=')

With this change the code runs for me through the whole csv file you have provided.

Note: A library you should definitely take a look at for image augmentation is to be found here.

Upvotes: 0

Abhineet Gupta
Abhineet Gupta

Reputation: 629

Based on the above code, the error seems to occur in the following code section -

11:     x = x + 1
12:     if prevName == dataset.iloc[x-1, 0] + '.jpg':
13:         if x < 10:
14:             prevName = 'ISIC_000000' + str(x-1) + '.jpg'
15:             if prevName == 'ISIC_0000005.jpg':
16:                 x = x + 1
17:                 prevName = 'ISIC_0000006.jpg'
...
36:         prevName = 'ISIC_000000' + str(x+1) + '.jpg'

So, if x == 5 and prevName == 'ISIC_0000005.jpg',

Line 11 assigns x -> 6,

Line 12 and 13 are true,

Line 14 assigns prevName -> 'ISIC_0000005.jpg'

Line 15 is true,

Line 16 and 17 assign x -> 7 and prevName -> 'ISIC_0000006.jpg'

Then, Line 36 (last line) that is outside the if statement will assign prevName -> 'ISIC_0000008.jpg'

When the loop restarts, Line 11 assigns x -> 8,

Line 12 is false and program continues until x > 1500 without entering the if block.

To fix the code, I recommend using str(x).zfill(7) which pads the integer with leading zeros, e.g, for x = 5 returns '0000005' and for x = 95 returns '0000095'. This would eliminate the need for specifying leading zeros based on number of digits in x, and simplify your code.

Upvotes: 1

Related Questions