Côme Schaeffer
Côme Schaeffer

Reputation: 37

Matching, filtering and grouping in list comprehension

I have a folder containing some images of the format:

wheel_0.jpg, tyre_2.jpg

but also some other formats:

bar_0.heic

and files like that (that I don't want to match):

hello.jpg

I want to create a list of image names that are jpeg format, do not finish by _0 and add them without their extension. I already made this piece of code that works fine:

images = os.listdir("images")
images_to_search = []
for image in images:
    re_obj = re.search("(.+)(_\d+)(\..+)", image)
    if re_obj:
        if re_obj.group(3) == ".jpg" and re_obj.group(2) != "_0":
            images_to_search.append(re.sub("\.jpg", '', image))

Is there any way to make this for loo into a list comprehension?

Upvotes: 1

Views: 102

Answers (4)

not_speshal
not_speshal

Reputation: 23146

I think you can do this without using re, like so

>>> [f.replace(".jpg","") for f in images if "_" in f and f.endswith(".jpg") and not f.endswith("_0.jpg")]

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626754

You can use

import re

images = os.listdir("images")
rx = re.compile(r'.+_(?!0\.)\d+\.jpg$')
images_to_search = [x.rsplit('.')[0] for x in filter(rx.match, images)]
# => ['tyre_2']

See the regex demo. The regex matches

  • .+ - any one or more chars other than line break chars as many as possible
  • _(?!0\.)\d+ - _ not followed with 0. and then one or more digits
  • \.jpg - .jpg text.

The re.match requires a match to occur only at the string start, no need prepending the pattern with ^.

See the Python demo:

images = ['tyre_2.jpg', 'bar_0.heic', 'hello.jpg', 'wheel_0.jpg']
import re
rx = re.compile(r'.+_(?!0\.)\d+\.jpg$')
print([x.rsplit('.')[0] for x in filter(rx.match, images)])
# => ['tyre_2']

Upvotes: 1

coolcoollemon
coolcoollemon

Reputation: 45

You may try this:

images = [x[:-4] for x in images if (x[-4:]==".jpg")&(x[-4:]!="_0.jpg")]

Upvotes: 0

hd1
hd1

Reputation: 34657

images = os.listdir('images')
images_to_search = [image for image in images if re.search("(.+)(_\d+)(\..+)", image).group(3) == '.jpg' and re.search("(.+)(_\d+)(\..+)", image).group(2) != '_0')

Upvotes: 0

Related Questions