Reputation: 119
I have a folder with a list of files named as follows
0.csv, 1.csv, 2.csv .... 359.csv
How do I extract the numbers that are missing in these file names? Assuming that the list starts with 0 and ends with 359.
The following code snippet reads all the files in the folder.
import os,sys
folder = '..Acad/Code'
for filename in os.listdir(folder):
infilename = os.path.join(folder,filename)
The following code snippet displays the missing elements from a list of integers.
def missing_numbers(num_list):
original_list = [x for x in range(num_list[0], num_list[-1] + 1)]
num_list = set(num_list)
return (list(num_list ^ set(original_list)))
How do I modify the above snippet to read from the output of the previous code? Any idea?
Upvotes: 1
Views: 143
Reputation: 12410
Your infilename
doesn't collect all file names of the folder, because you overwrite it in each loop. Therefore it only shows the last entry. How about this solution, using list comprehensions:
#separate the file name from the file name extension for each file in the folder
filenumbers = [x.split(".")[0] for x in os.listdir(folder)]
#compare numbers in expected range with file numbers in folder
missingnumbers = [i for i in range(360) if str(i) not in set(filenumbers)]
Upvotes: 2
Reputation: 14744
You should make a set
of ints you have as follows:
my_set = set(int(f.split('.csv')[0]) for f in os.listdir('./') if os.path.isfile(f) and 'csv' in f)
Then compare this with these of all ints:
missing_ints = set(range(max(my_list))) - my_set
missing_files = [str(i) + '.csv' for i in missing_ints]
This will give you the ints which are not in the list of files you have
So if you have 0.csv
1.csv
and 3.csv
then my_set
is {0, 1, 3}
, max(my_set)
is 3, set(range(max(my_list)))
is {0, 1, 2, 3}
and then the difference missing_ints = set(range(max(my_list))) - my_set
is {2}
and therefore missing_files = ['2.csv']
If you have a large number of files comparing set
s will be faster than comparing strings.
Upvotes: 1