Reputation: 1
I am trying to delete some files that are not on the list (list_of_labeled) but the condition is never true and the condition is for some reason checked only 4010 time (it should check it 4010 * number of files times (10000))
c = 0
with open("C:\_base\MyCode\Song_genre_classification\MillionSongSubset\list_of_labeled.txt") as list_of_labeled:
for path, subdirs, files in os.walk("C:\_base\MyCode\Song_genre_classification\MillionSongSubset\data"):
for name in files:
for line in list_of_labeled:
c += 1
if name == line[:21]:
# os.remove(path)
print(name)
print(c)
Upvotes: 0
Views: 51
Reputation: 178115
You open the file once outside the loop. The first time for line in list_of_labeled:
is iterated, the file is exhausted. Future loops read nothing because the file is at the end. Either rewind the file each time before the for line
, or load the file into a list
and re-use the list.
Also this algorithm is really slow. Instead of reading the entire file and slicing the line for each name, read the file once, slice the line, and store it in a set
for fast searching. Something like (untested):
with open(r"C:\_base\MyCode\Song_genre_classification\MillionSongSubset\list_of_labeled.txt") as list_of_labeled:
lines = {line[:21] for line in list_of_labeled}
for path, subdirs, files in os.walk(r"C:\_base\MyCode\Song_genre_classification\MillionSongSubset\data"):
for name in files:
if name in lines:
print(name)
Upvotes: 1