Rad'Val
Rad'Val

Reputation: 9241

Regex for directory and file listing in C

I'm trying to write an expression that would filter out several types of directories and files when listing the contents of a directory. Namely, I want to avoid listing the current directory (.), upper directory (..), hidden files and other more specific directories.

This is what I have now:

[\\.+]|cgi-bin|recycle_bin

However, it doesn't match ., .., recycle_bin nor cgi-bin. If I remove all the | operands and leave the expression to only [\\.+], it works (matches ., .., etc). Which is strange, because I'm pretty sure | = OR. Do I miss something?

UPDATE 1: Here is the code I use:

            regex_t regex;
            int reti;
            char msgbuf[100];

            /* Compile regular expression */
            reti = regcomp(&regex, "[\\.+]|cgi-bin|recycle_bin", 0);


            if( reti )
            { 
                fprintf(stderr, "Could not compile regex\n");
                exit(1);
            }

            reti = regexec(&regex, entry->d_name, 0, NULL, 0);
            if( !reti ){

                printf("directoy %s found -> %s", path, entry->d_name);
                printf("\n");

            }
            else if( reti == REG_NOMATCH ){

                //if the directory is not filtered out, we add it to the watch list
                printf("good dir %s", entry->d_name);                    
                printf("\n");

            }
            else{
                regerror(reti, &regex, msgbuf, sizeof(msgbuf));
                fprintf(stderr, "Regex match failed: %s\n", msgbuf);
            }

Upvotes: 1

Views: 2509

Answers (3)

Dave
Dave

Reputation: 11162

This isn't what the C regex library is for. Its purpose is to let you build programs which accept regexen as input. This problem is solved much better without regex:

#define SIZE(x) (sizeof (x)/sizeof(*(x)))
char *unwanted[] = {
     ".",
     "cgi-bin",
     "recycle_bin",
};
int x;
for(x=0; x<SIZE(unwanted); x++)
     if(strstr(entry->d_name, unwanted[x])!=NULL)
           goto BadDir;
//good dir
BadDir:

Ignoring what your present regex means, you probably want something like:

char *begins[] = {".", "private_"};
char *equals[] = {"recycle_bin", "cgi-bin"};
char *contains[] = {"_reject_"};

for(x=0; x<SIZE(begins); x++)
    if(strncmp(entry->d_name, begins[x], strlen(begins[x]))==0)
          goto BadDir;
for(x=0; x<SIZE(equals); x++)
    if(strcmp(entry->d_name, equals[x])==0)
          goto BadDir;
for(x=0; x<SIZE(contains); x++)
    if(strstr(entry->d_name, contains[x])!=NULL)
          goto BadDir;
//good dir...
BadDir:

Upvotes: 0

pmg
pmg

Reputation: 108988

Use "extended REs". With regular ("obsolete") ones, the | is an ordinary character.

regcomp(..., REG_EXTENDED);

Also see the regcomp() description.

Upvotes: 3

Jack
Jack

Reputation: 16724

Fllow the pmg's comment and try this regex:

^([.]{0,2}|cgi-bin|recycle_bin)$

[.]{0,2} it match . and ..

Upvotes: 0

Related Questions