Jatin Khurana
Jatin Khurana

Reputation: 1175

Regular expression in bash not working

Is there any way in bash so that I can match the patter like that

[0-9]{8}.*.jpg

I have written the above for the following pattern match "First 8 character should be digit and rest of them would be anything and end with .jpg" but the above is not working. if I write in the below manner it's working

[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].*.jpg

Now suppose I want first 20 character must be digit should I repeat the [0-9] 20 times.. I think there is a better solution available which i don't know...

If anyone know please help....

Upvotes: 1

Views: 797

Answers (3)

clt60
clt60

Reputation: 63902

A bit late answer.

Bash's filename exapnsion patterns ( called globbing ) has it's own rules. They're exists in two forms:

  • simple globbing
  • extended globbing (if you have enabled shopts -s extglob

You can read about the both rules for example here. (3.5.8.1 Pattern Matching)

You should remember, the globbing rules aren't the traditional regular expressions (as you probably know for grep or sed and such), and especially they're not the perl's (extended) regular expressions.

So, if you want use filename expansion (aka globbing) you're stuck with the above two (simple/extended) pattern rules. Of course, bash knows regular expressions, but not for filename-expansion (globbing).

So, you can for example do the next:

shopt -s globstar        #if you haven't already enabled - for the ** expansion
regex="[0-9]{8}.*\.jpg"
for file in ./**/*.jpg   #will match all *.jpg recusrively (globstar)
do
    #try the regex matching
    [[ $file =~ $regex ]] || continue  #didn't match

    #matched! - do something with the file
    echo "the $file has at least 8 digits"
done

or you can use, the find command with the built-in regex matching rules (see other answers), or the grep with perl-like regexes, such:

find somewhere -type f -name \*.jpg -maxdepth 1 -print0 | grep -zP '/\d{8}.*.jpg'

The speed: for the large trees the find is faster. At least on my notebook, where:

while IFS= read -d $'\0' -r file
do
    echo "$file"
done < <(find ~/Pictures -name \*.JPG -print0 | grep -zP 'P\d{4}.*\.JPG')

runs real 0m1.593s, and the

regex="P[0-9]{4}.*\.JPG"
for file in ~/Pictures/**/*.JPG
do
    [[ $file =~ $regex ]] || continue  #didn't match

    echo "$file"
done

runs real 0m3.628s seconds.

On the small trees, IMHO is better to use the builting bash regexes. (maybe, I prefer it because i like the ./**/*.ext expansion, and got all filenames correctly inside the variable, regardless of spaces and like, without the care about the -print0 and read -d $'\0; and such...)

Upvotes: 1

fedorqui
fedorqui

Reputation: 289725

You can use the regex in find:

find test -regextype posix-extended -regex "^[0-9]{8}.*.jpg$"

Test

$ touch test/12345678aaa.jpg
$ touch test/1234567aaa.jpg

$ find test -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg

And if it is related to the previous question, you can use:

for file in $(find test -regextype posix-extended -regex ".*/[0-9]{8}.*")
do
   echo "my file is $file"
done

If you create directories and files in them, more matchings can appear:

$ mkdir test/123456789.dir
$ touch test/123456789.dir/1234567890.jpg

You can filter by -type f, so that you just get files:

$ find test -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
test/123456789.dir/1234567890.jpg

And/or specify the depth of the find, so that it does not contain subdirectories:

$ find test -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg

Upvotes: 3

John Zwinck
John Zwinck

Reputation: 249153

It looks like you're trying to generate a list of filenames from a regular expression. You can do that, but not directly from Bash as far as I know. Instead, use find:

find -E . -regex '.*/[0-9]{8}.*\.jpg' -depth 1

Something like that works on my Mac OS X system; on Linux the . for current directory is optional, or you can specify a different directory to search in. I added -depth 1 to avoid descending into subdirectories.

Upvotes: 2

Related Questions