Reputation: 925
I have about 45,000 files. My purpose is to extract one certain line from each file and accumulate them on single file.
I tried to use glob.glob, but the problem is that with this module, the order of file seems mixed.
filin= diri+ '*.out'
list_of_files = glob.glob(filin)
print list_of_files
with open("A.txt", "w") as fout:
for fileName in list_of_files:
data_list = open( fileName, 'r' ).readlines()
fout.write(data_list[12])
Above is the code I used. Mainly, I borrowed from someone elses code in this forum.
I would like to read all ".out' files in order. Each of these files contains data at one minute interval. For example, one file contains data at 2014/1/1/ 00:00 and consequent file has data at 2014/1/1/ 00:01. So reading these file in order is very important. However, when I used glob.glob and print list_of_files above, file order seems pretty mixed. Could I solve this problem?
Also, as shown above, I would like to read 12th lines from the top from each file, but result repeatedly shows "out of index".
The question seems not very organized. Any idea or help would be really appreciated.
P.S the name of files are such as:Data_201308032343.out, Data_201308032344.out, Data_201308032345.out ......
Thank you.
Upvotes: 1
Views: 1572
Reputation: 2576
list_of_files = sorted(glob.glob(filin))
data_list[12] reads the 13'th line of the file because it is a zero-indexed list. That might be the cause of the "Index out of range" exception.
Upvotes: 1
Reputation: 32299
As documented at the os.listdir
documentation, the directory entries are returned in an arbitrary order. If you want to apply a specific order, you'll need to ensure that yourself:
list_of_filenames = glob.glob(input_fileglob)
sorted_list_of_filenames = sorted(list_of_filenames)
with open("A.txt", 'w') as outfile:
for filename in sorted_list_of_filenames:
data_list = open(filename).readlines()
outfile.write(data_list[12])
Upvotes: 2