Jürgen K.
Jürgen K.

Reputation: 3487

Cut out a sequence of files using glob in python

I have a directory with files like img-0001.jpg, img-0005.pg, img-0006.jpg, ... , img-xxxx.jpg. What I need to do is to get a list with all files starting at 0238, literally img-0238.jpg. The next existing filename is img-0240.jpg

Right now I use glob to get all filenames.

list_images = glob.glob(path_images + "*.jpg")

Thanks in advance

Edit:

-> The last filename is img-0315.jpg

Upvotes: 0

Views: 839

Answers (3)

facelessuser
facelessuser

Reputation: 1734

For something like this, you could try the wcmatch library. It's a library that aims to enhance file globbing and wildcard matching.

In this example, we enable brace expansion and demonstrate the pattern by filtering a list of files:

from wcmatch import glob

files = []
# Generate list of files from img-0000.jpg to img-0315.jpg
for x in range(316):
    files.append('path/img-{:04d}.jpg'.format(x))

print(glob.globfilter(files, 'path/img-{0238..0315}.jpg', flags=glob.BRACE))

And we get the following output:

['path/img-0238.jpg', 'path/img-0239.jpg', 'path/img-0240.jpg', 'path/img-0241.jpg', 'path/img-0242.jpg', 'path/img-0243.jpg', 'path/img-0244.jpg', 'path/img-0245.jpg', 'path/img-0246.jpg', 'path/img-0247.jpg', 'path/img-0248.jpg', 'path/img-0249.jpg', 'path/img-0250.jpg', 'path/img-0251.jpg', 'path/img-0252.jpg', 'path/img-0253.jpg', 'path/img-0254.jpg', 'path/img-0255.jpg', 'path/img-0256.jpg', 'path/img-0257.jpg', 'path/img-0258.jpg', 'path/img-0259.jpg', 'path/img-0260.jpg', 'path/img-0261.jpg', 'path/img-0262.jpg', 'path/img-0263.jpg', 'path/img-0264.jpg', 'path/img-0265.jpg', 'path/img-0266.jpg', 'path/img-0267.jpg', 'path/img-0268.jpg', 'path/img-0269.jpg', 'path/img-0270.jpg', 'path/img-0271.jpg', 'path/img-0272.jpg', 'path/img-0273.jpg', 'path/img-0274.jpg', 'path/img-0275.jpg', 'path/img-0276.jpg', 'path/img-0277.jpg', 'path/img-0278.jpg', 'path/img-0279.jpg', 'path/img-0280.jpg', 'path/img-0281.jpg', 'path/img-0282.jpg', 'path/img-0283.jpg', 'path/img-0284.jpg', 'path/img-0285.jpg', 'path/img-0286.jpg', 'path/img-0287.jpg', 'path/img-0288.jpg', 'path/img-0289.jpg', 'path/img-0290.jpg', 'path/img-0291.jpg', 'path/img-0292.jpg', 'path/img-0293.jpg', 'path/img-0294.jpg', 'path/img-0295.jpg', 'path/img-0296.jpg', 'path/img-0297.jpg', 'path/img-0298.jpg', 'path/img-0299.jpg', 'path/img-0300.jpg', 'path/img-0301.jpg', 'path/img-0302.jpg', 'path/img-0303.jpg', 'path/img-0304.jpg', 'path/img-0305.jpg', 'path/img-0306.jpg', 'path/img-0307.jpg', 'path/img-0308.jpg', 'path/img-0309.jpg', 'path/img-0310.jpg', 'path/img-0311.jpg', 'path/img-0312.jpg', 'path/img-0313.jpg', 'path/img-0314.jpg', 'path/img-0315.jpg']

So, we could apply this to a file search:

from wcmatch import glob

list_images = glob.glob('path/img-{0238..0315}.jpg', flags=glob.BRACE)

In this example, we've hard coded the path, but in your example, make sure path_images has a trailing / so that the pattern is constructed correctly. Others have suggested this might be an issue. Print out your pattern to confirm the pattern is correct.

Upvotes: 0

tripleee
tripleee

Reputation: 189910

You can specify multiple repeated wildcards to match all files whose number is 23[89] or 2[4-9][0-9] or 30[0-9] etc;

list_images = []
for pattern in ('023[89]', '02[4-9][0-9]', '030[0-9]', '031[0-5]'):
    list_images.extend(glob.glob(
        os.path.join(path_images, '*{0}.jpg'.format(pattern))))

or you can just filter out the ones you don't want.

list_images = [x for x in glob.glob(os.path.join(path_images, "*.jpg"))
    if 238 <= int(x[-8:-4]) <= 315]

Upvotes: 0

andnik
andnik

Reputation: 2804

Glob doesn't allow regex filtering. But you filter list right after you receive all matching files. Here is how it would look like using re:

import re

list_images = [f for f in glob.glob(path_images + "*.jpg") \
    if re.search(r'[1-9]\d{3,}|0[3-9]\d{2,}|02[4-9]\d|023[8-9]\.jpg$', f)]

The regular expression with verify that file ends with number with 4 digits bigger or equal 0238.

You can play around with regular expression using https://regex101.com/

Basically, we check if number is:

  • starts with 1 followed by any 3 digits
  • or starts with 0[3-9] followed by any 2 digits
  • or starts with 02[4-9] followed by any 1 digit
  • or starts with 023 and followed by either 8 or 9.

But it's probably would be easier to do simple comparison:

list_images = [f for f in glob.glob(path_images + "*.jpg") \
     if f[-8:-4] > "0237" and f[-8:-4] < "0316"]

Upvotes: 1

Related Questions