Phil G
Phil G

Reputation: 171

Finding files using regular expressions/wildcards

Within a particular directory, I have a series of files that are labelled sequentially:image0000.png, image0001.png, image0002.png, etc.. They are labelled by number, but I don't necessarily know how many preceding zeroes there are in the filename, i.e. whether it will be image0001.png or image00001.png.

Within a bash script, I wish to find a single file at a time (over a for loop), and then apply some processing to the file. This search could start at zero and end before I've reached the end, or could be of varying steps. To expand, I could want to find image0000.png, image0001.png, image0002.png and so forth, or I could start at image0010.png and find every other file, i.e. the next two would be image0012.png and image0014.png.

To try and find the first file (image0000.png), I've tried using find and ls, with the following outputs:

$ find video/figs/ -name 'image*[0]0.png'
video/figs/image00100.png
video/figs/image00000.png
$ ls video/figs/image*[0]0.png
-rw-r--r-- 1 user machine 165K Feb 19 09:06 video/figs/image00000.png
-rw-r--r-- 1 user machine 207K Feb 19 09:06 video/figs/image00100.png

Similar results occur for finding the second (i.e., find video/figs/ -name 'image*[0]0.png' finds image00101.png and image00001.png. So it's finding the file I want (image00001.png), but is also finding one that I don't (image00101.jpg). Can anyone help me understand why, and fix it?

Upvotes: 1

Views: 1539

Answers (3)

Phil G
Phil G

Reputation: 171

Sincere thanks to everyone who contributed an answer - perhaps I explained it poorly, or I was too wedded to the code I'd already written to use any of the provided answers. However, I've found the following solutions:

1) Why did I find more answers than I expected?

find video/figs/ -name 'image*[0]0.png' uses very limited comprehension of wildcards, and thus the above was interpreted as finding a file with name image<wildcard>00.png. There is no way, using the -name option, to restrict the application of * to match only a given character (in this case, only find zero or more matches to 0.

2) How do I find the image files with an unknown number of padding zeroes?

The following is a MWE from my final code. It demonstrates how to search within a given directory SEARCH_DIR (not necessarily including subdirectories, but I haven't checked)

f1=0 # Starting number
f2=10 # End number
df=2 # number to skip between images
for ((f=$f1; f<=$f2; f=$f+$df)); do
   export iFile=$(find $SEARCH_DIR -regex '.*/image0*'$f'.png')
done

The export ensures the variable is available to sub-processes, with the iFile=$() syntax allowing me to export the result of the command to the variable iFile. The bit within the parentheses is the bit I was looking for: find $SEARCH_DIR -regex '.*/image[0]*'$f'.png'

a) find $SEARCH_DIR specifies the location for the search

b) -regex specifies to use regular expressions, which are more powerful than standard bash scripting and allow me to limit wildcards as required

c) '.*/image0*'$f'.png': The regular expression search looks over the entire string, so apparently I need the initial .*/ to perform the match. The 0* now performs as I originally wanted - the * wildcard is now searching for zero or more matches of the preceding term, which here is 0 (so if I wanted to search for zero or more matches of any digit, I would use [0-9]*). The $f term is to search for the numbered file in the for loop.

Upvotes: 0

user unknown
user unknown

Reputation: 36229

I suspect you don't want to dive into subdirectories, and find files, sorted by number, spread over subdirs.

So find isn't necessary.

ls image*{08..10}.png
image00010.png  image0008.png  image0009.png  image0010.png  image008.png  image009.png

Part 2 of your question, only find every other file:

ls image*{08..10..2}.png
image00010.png  image0008.png  image0010.png  image008.png

Maybe you know for-loops. It's like that,

for (i in 8 to 10 by 2) 

or

for (int i=8; i <= 10; i+=2) 

Restricting the search to find image image00010.png but not imageAB010.png wouldn't work.

The reason to exclude 101 is still unclear. Maybe it's only a sorting thing.

With directories, which aren't the PWD, there is no big difference:

ls video/figs/image*{08..10..2}.png

Note, that instead of ls, you use just the program, you want to process on the files, if the program is able to handle more than one file at a time, like ls.

Upvotes: 0

Maroun
Maroun

Reputation: 95948

I would use ls and grep for that:

ls | grep -oP 0*[1-9]+.png

Example:

$:/tmp/test$ ls
00001.png 00002.png 00010.png 00013.png 00201.png

$:/tmp/test$ ls | grep -oP 0*[1-9]+.png
00001.png
00002.png
00013.png
01.png

Upvotes: 1

Related Questions