Reputation: 49
I am using a keras library to preprocess my data after the initial step of predefining the photos into folder with their classification. I didn't want to manualy do it so I made my own script, but it isn't working, could somebody help debug it?
It doesn't give a specific error, but it just doesn't finish doing the job and stops at photo ISIC_0000006
. wah
is if the photo is classified cancerous, yay
is if the photo is classified benign cancer. The dataset returns a 1
if it is bad, and 0
if it is okay. I still don't know the problem.
This is the dataset I am using.
By the way, I am only a kid still so please don't expect me to know too much about programming.
Sample lines from dataset:
ISIC_0000000 = 0
ISIC_0000001 = 0
ISIC_0000002 = 1
ISIC_0000003 = 0
ISIC_0000004 = 0
ISIC_0000005 = 1
My code:
import pandas as pd
import os
dataset = pd.read_csv('ISIC-2017_Training_Part3_GroundTruth.csv')
dataset = dataset.iloc[:, :-1]
x = 0
xb = 0
xm = 0
prevName = 'ISIC_0000000.jpg'
newName = 'yay/benign1'
while(x <= 1500):
x = x + 1
if prevName == dataset.iloc[x-1, 0] + '.jpg':
if x < 10:
prevName = 'ISIC_000000' + str(x-1) + '.jpg'
if prevName == 'ISIC_0000005.jpg': #dataset has random hole so skips over
x = x + 1
prevName = 'ISIC_0000006.jpg'
elif x < 100:
prevName = 'ISIC_00000' + str(x-1) + '.jpg'
elif x < 1000:
prevName = 'ISIC_0000' + str(x-1) + '.jpg'
else:
prevName = 'ISIC_000' + str(x-1) + '.jpg'
if dataset.iloc[x-1, 1] == 1:
xm = xm + 1
newName = 'melanoma' + str(xm)
else:
xb = xb + 1
newName = 'benign' + str(xb)
if newName == 'benign' + str(xb):
newName = 'yay/' + newName + '.jpg'
os.rename(prevName, newName)
else:
newName = 'wah/' + newName + '.jpg'
os.rename(prevName, newName)
prevName = 'ISIC_000000' + str(x+1) + '.jpg'
EDIT!!! This is my new code thanks to Abhineet Gupta that is able to go further through the dataset, but oddly stops at photo 34:
import pandas as pd
import os
dataset = pd.read_csv('_ISIC-2017_Training_Part3_GroundTruth.csv')
dataset = dataset.iloc[:, :-1]
x = 0
xb = 0
xm = 0
prevName = 'ISIC_0000000.jpg'
newName = 'yay/benign1'
while(x <= 1500):
x = x + 1
prevName = 'ISIC_' + str(x).zfill(7) + '.jpg'
if prevName == dataset.iloc[x-1, 0] + '.jpg':
if x == '0000005':
x = x + 1
prevName = 'ISIC_000006.jpg'
if dataset.iloc[x-1, 1] == 1:
xm = xm + 1
newName = 'melanoma' + str(xm)
else:
xb = xb + 1
newName = 'benign' + str(xb)
if newName == 'benign' + str(xb):
newName = 'yay/' + newName + '.jpg'
os.rename(prevName, newName)
else:
newName = 'wah/' + newName + '.jpg'
os.rename(prevName, newName)
prevName = 'ISIC_000000' + str(x+1) + '.jpg'
Last edit: it turns out it wasn't the codes fault just the .csv file was messed up. Thanks Abhineet Gupta and mrk for the solutions!!!
Upvotes: 1
Views: 160
Reputation: 10406
Since you're reading the csv
file with '='
as a delimiter you have to specify that while loading, at least that was the error I experienced when trying to run your code.
Try changing your line to:
dataset = pd.read_csv('ISIC-2017_Training_Part3_GroundTruth.csv', sep = '=')
With this change the code runs for me through the whole csv file you have provided.
Note: A library you should definitely take a look at for image augmentation is to be found here.
Upvotes: 0
Reputation: 629
Based on the above code, the error seems to occur in the following code section -
11: x = x + 1
12: if prevName == dataset.iloc[x-1, 0] + '.jpg':
13: if x < 10:
14: prevName = 'ISIC_000000' + str(x-1) + '.jpg'
15: if prevName == 'ISIC_0000005.jpg':
16: x = x + 1
17: prevName = 'ISIC_0000006.jpg'
...
36: prevName = 'ISIC_000000' + str(x+1) + '.jpg'
So, if x == 5
and prevName == 'ISIC_0000005.jpg'
,
Line 11 assigns x -> 6
,
Line 12 and 13 are true
,
Line 14 assigns prevName -> 'ISIC_0000005.jpg'
Line 15 is true
,
Line 16 and 17 assign x -> 7
and prevName -> 'ISIC_0000006.jpg'
Then, Line 36 (last line) that is outside the if
statement will assign prevName -> 'ISIC_0000008.jpg'
When the loop restarts, Line 11 assigns x -> 8
,
Line 12 is false
and program continues until x > 1500
without entering the if
block.
To fix the code, I recommend using
str(x).zfill(7)
which pads the integer with leading zeros, e.g, for x = 5
returns '0000005'
and for x = 95
returns '0000095'
. This would eliminate the need for specifying leading zeros based on number of digits in x
, and simplify your code.
Upvotes: 1