user2095624
user2095624

Reputation: 381

python access selective files with glob module

I have a collection of binary files which have names as so:

d010-recomb.bin
d011-recomb.bin
.............
.............
.............
d100-recomb.bin

Using the python glob module, i can access all the files in a folder and can do further processing with these files:

import glob
binary = sorted(glob.glob('C:/Users/Desktop/bin/*.bin')) 

I can also use some criteria for the files that I want to access:

FOr example if I use the following code then I will gain access to all the files from d010-recomb.bin to d019-recomb.bin

binary = sorted(glob.glob('C:/Users/Desktop/bin/d01*.bin'))

But using this criteria I can't get access to files such as d015 to d025.

Please tell me what I can do to gain access to these files.

Upvotes: 0

Views: 686

Answers (3)

Vyktor
Vyktor

Reputation: 20997

You can either filter list, using:

def filter_path(path,l,r):
    i = int(os.path.basename(path)[1:4])
    if (i >= l) and (i <= r):
        return True
    return False

result = [i for i in binary if filter_path(i,19,31)]

If you are 100% confident about number of elements in directory, you can:

result = binary[19:30]

Or once you have data sorted, you may find the first index and the last index and [1][2]:

l = binary.find('C:/Users/Desktop/bin/d015.bin')
r = binary.find('C:/Users/Desktop/bin/d023.bin')
result = binary[l:r+1]

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121406

Filter the list afterwards; either turn the filename portion to an int or create a range of strings that are to be included:

included = {'d{:03d}'.format(i) for i in range(15, 26)}  # a set

binary = sorted(f for f in glob.glob('C:/Users/Desktop/bin/*.bin') if f[21:25] in included) 

The above code generates the strings 'd015' through to 'd025' as a set of strings for fast membership testing, then tests the first 4 characters of each file against that set; because glob() returns whole filenames we slice off the path for that to work.

For variable paths, I'd store the slice offset, for speed, based on the path:

pattern = 'C:/Users/Desktop/bin/*.bin'
included = {'d{:03d}'.format(i) for i in range(15, 26)}  # a set
offset = len(os.path.dirname(pattern)) + 1

binary = sorted(f for f in glob.glob(pattern) if f[offset:offset + 4] in included) 

Demo of the latter:

$ mkdir test
$ touch test/d014-recomb.bin
$ touch test/d015-recomb.bin
$ touch test/d017-recomb.bin
$ touch test/d018-recomb.bin
$ fg
bin/python2.7
>>> import os, glob
>>> pattern = '/tmp/stackoverflow/test/*.bin'
>>> included = {'d{:03d}'.format(i) for i in range(15, 26)}  # a set
>>> offset = len(os.path.dirname(pattern)) + 1
>>> sorted(f for f in glob.glob(pattern) if f[offset:offset + 4] in included)
['/tmp/stackoverflow/test/d015-recomb.bin', '/tmp/stackoverflow/test/d017-recomb.bin', '/tmp/stackoverflow/test/d018-recomb.bin']

Upvotes: 0

glglgl
glglgl

Reputation: 91017

You'll probably have to add this restriction manually, as it cannot be accomplished by a glob pattern.

If you exactly know how the file names are built, you could do

import os
for i in range(19, 34): # 19 to 33
    filename = "d%03d-recomb.bin" % i
    if os.path.exists(os.path.join('C:/Users/Desktop/bin', filename)):
        print filename

Upvotes: 0

Related Questions