dilipyadav
dilipyadav

Reputation: 53

Regular expression matching and finding file of particular type using shell script

I have a situation where I am iterating folder and subfolders using shell script. Once I reached at sub-directory whose name will be like 1.5.8/10.50.80.500(basically containing numbers and periods) I am trying to find two different kinds of file having certain extension *.jar and *.pom. But the problem is all the folders whose names are basically alphabets are also full-filling the condition related to the file name. Appreciate your help.

    for FILE in $(find . -type d) ;
    do
    if test -d $FILE ; 
    then 
        if [[ $FILE == */[[[1-9]\.[1-9]\.[1-9]] || [[1-9]*\.[1-9]*\.[1-9]*]] ]] ; 
        then 
             echo "**********************************************************"
             echo $FILE
             #echo "**********************************************************"

             pom_file=find . -name "*.pom"
             jar_file=find . -name "*.jar"

        fi
     fi
 done;

Currently the code is printing everything below is the output

 ./javax
**********************************************************
./javax/biz
**********************************************************
./javax/biz/pivotal
**********************************************************
./javax/biz/pivotal/active
**********************************************************
./javax/biz/pivotal/active/4.3.4
**********************************************************

But ideally the last line of the output should only be printed as that should match the pattern.

Upvotes: 0

Views: 798

Answers (3)

tshiono
tshiono

Reputation: 22087

Please try the following:

find . -regextype posix-egrep -regex '.*/([[:digit:]]+\.)*[[:digit:]]+/[^/]+\.(jar|pom)'

It matches a file which satisfies:

  • has an extension .jar or .pom
  • the parent directory name is composed of numbers and optional dots such as 1, 1.20, 1.20.3 etc.

[Update] If you want to print just the directory names which contain the specified files, you can say instead:

find . -regextype posix-egrep -regex '.*/([[:digit:]]+\.)*[[:digit:]]+/[^/]+\.(jar|pom)' -print0 | xargs -0 -L 1 dirname

Hope this helps.

Upvotes: 2

Camusensei
Camusensei

Reputation: 1563

Going the other way around, you could find all pom/jar files and only print the directories you want:

find . -name '*.pom' -or -name '*.jar' -printf '%P\n' |
  sed -En '\_/[0-9.]+/[^/]+.(jar|pom)$_ s_/[^/]+$__ p' |
  sort -u

Here, find lists the pom/jar files, sed filters only the directories which match characters among [0-9.] and prints them, and sort -u removes duplicates.

Note: If you care about the order in which find outputs the directories, you can sort without losing the original order with awk '!seen[$0]++'

EDIT: More inforrmation about sed, I'm using the code match substitute print which works as follows:

  • print is only done when a substitution occured
  • substitution is only done if it the input matches and only if the previous match part matches
  • I'm using _ as a delimiter instead of / because I use / in the patterns

Upvotes: 0

stephanmg
stephanmg

Reputation: 766

I'd suggest going from bash pattern matching to regex matching via changing == to =~ as follows (Is it intentional that you want to exclude the 0 digit?):

if [[ "$FILE" =~ [1-9]+\.[1-9]\.+[1-9]+$ ]]; then
   echo "File matches"
fi

Probably a more reasonable approach would be to use find and use the -iregex option. In this way this can be compressed entirely into one statement.

Upvotes: 2

Related Questions