Sahitya Sridhar
Sahitya Sridhar

Reputation: 119

Print numbers that don't appear in folder Python

I have a folder with a list of files named as follows

0.csv, 1.csv, 2.csv .... 359.csv

How do I extract the numbers that are missing in these file names? Assuming that the list starts with 0 and ends with 359.

The following code snippet reads all the files in the folder.

import os,sys
folder = '..Acad/Code'
for filename in os.listdir(folder):
       infilename = os.path.join(folder,filename)

The following code snippet displays the missing elements from a list of integers.

def missing_numbers(num_list):
      original_list = [x for x in range(num_list[0], num_list[-1] + 1)]
      num_list = set(num_list)
      return (list(num_list ^ set(original_list)))

How do I modify the above snippet to read from the output of the previous code? Any idea?

Upvotes: 1

Views: 143

Answers (2)

Mr. T
Mr. T

Reputation: 12410

Your infilename doesn't collect all file names of the folder, because you overwrite it in each loop. Therefore it only shows the last entry. How about this solution, using list comprehensions:

#separate the file name from the file name extension for each file in the folder
filenumbers = [x.split(".")[0] for x in os.listdir(folder)]
#compare numbers in expected range with file numbers in folder
missingnumbers = [i for i in range(360) if str(i) not in set(filenumbers)]

Upvotes: 2

ted
ted

Reputation: 14744

You should make a set of ints you have as follows:

my_set = set(int(f.split('.csv')[0]) for f in os.listdir('./') if os.path.isfile(f) and 'csv' in f)

Then compare this with these of all ints:

missing_ints = set(range(max(my_list))) - my_set
missing_files = [str(i) + '.csv' for i in missing_ints]

This will give you the ints which are not in the list of files you have

So if you have 0.csv 1.csv and 3.csv then my_set is {0, 1, 3}, max(my_set) is 3, set(range(max(my_list))) is {0, 1, 2, 3} and then the difference missing_ints = set(range(max(my_list))) - my_set is {2} and therefore missing_files = ['2.csv']

If you have a large number of files comparing sets will be faster than comparing strings.

Upvotes: 1

Related Questions