Reputation: 381
I have a collection of binary files which have names as so:
d010-recomb.bin
d011-recomb.bin
.............
.............
.............
d100-recomb.bin
Using the python glob module, i can access all the files in a folder and can do further processing with these files:
import glob
binary = sorted(glob.glob('C:/Users/Desktop/bin/*.bin'))
I can also use some criteria for the files that I want to access:
FOr example if I use the following code then I will gain access to all the files from d010-recomb.bin to d019-recomb.bin
binary = sorted(glob.glob('C:/Users/Desktop/bin/d01*.bin'))
But using this criteria I can't get access to files such as d015 to d025.
Please tell me what I can do to gain access to these files.
Upvotes: 0
Views: 686
Reputation: 20997
You can either filter list, using:
def filter_path(path,l,r):
i = int(os.path.basename(path)[1:4])
if (i >= l) and (i <= r):
return True
return False
result = [i for i in binary if filter_path(i,19,31)]
If you are 100% confident about number of elements in directory, you can:
result = binary[19:30]
Or once you have data sorted, you may find the first index and the last index and [1][2]:
l = binary.find('C:/Users/Desktop/bin/d015.bin')
r = binary.find('C:/Users/Desktop/bin/d023.bin')
result = binary[l:r+1]
Upvotes: 1
Reputation: 1121406
Filter the list afterwards; either turn the filename portion to an int
or create a range of strings that are to be included:
included = {'d{:03d}'.format(i) for i in range(15, 26)} # a set
binary = sorted(f for f in glob.glob('C:/Users/Desktop/bin/*.bin') if f[21:25] in included)
The above code generates the strings 'd015'
through to 'd025'
as a set of strings for fast membership testing, then tests the first 4 characters of each file against that set; because glob()
returns whole filenames we slice off the path for that to work.
For variable paths, I'd store the slice offset, for speed, based on the path:
pattern = 'C:/Users/Desktop/bin/*.bin'
included = {'d{:03d}'.format(i) for i in range(15, 26)} # a set
offset = len(os.path.dirname(pattern)) + 1
binary = sorted(f for f in glob.glob(pattern) if f[offset:offset + 4] in included)
Demo of the latter:
$ mkdir test
$ touch test/d014-recomb.bin
$ touch test/d015-recomb.bin
$ touch test/d017-recomb.bin
$ touch test/d018-recomb.bin
$ fg
bin/python2.7
>>> import os, glob
>>> pattern = '/tmp/stackoverflow/test/*.bin'
>>> included = {'d{:03d}'.format(i) for i in range(15, 26)} # a set
>>> offset = len(os.path.dirname(pattern)) + 1
>>> sorted(f for f in glob.glob(pattern) if f[offset:offset + 4] in included)
['/tmp/stackoverflow/test/d015-recomb.bin', '/tmp/stackoverflow/test/d017-recomb.bin', '/tmp/stackoverflow/test/d018-recomb.bin']
Upvotes: 0
Reputation: 91017
You'll probably have to add this restriction manually, as it cannot be accomplished by a glob pattern.
If you exactly know how the file names are built, you could do
import os
for i in range(19, 34): # 19 to 33
filename = "d%03d-recomb.bin" % i
if os.path.exists(os.path.join('C:/Users/Desktop/bin', filename)):
print filename
Upvotes: 0