Giovanni
Giovanni

Reputation: 121

Java Regex file extension

I have to check if a file name ends with a gzip extension. In particular I'm looking for two extensions: ".tar.gz" and ".gz". I would like to capture the file name (and path) as a group using a single regular expression excluding the gzip extension if any. I tested the following regular expressions on this example path

String path = "/path/to/file.txt.tar.gz";
  1. Expression 1:

    String rgx = "(.+)(?=([\\.tar]?\\.gz)$)";
    
  2. Expression 2:

    String rgx = "^(.+)[\\.tar]?\\.gz$";
    

Extracting group 1 in this way:

Matcher m = Pattern.compile(rgx).matcher(path);           
if(m.find()){
   System.out.println(m.group(1));
}

Both regular expressions give me the same result: /path/to/file.txt.tar and not /path/to/file.txt. Any help will be appreciated.

Thanks in advance

Upvotes: 3

Views: 11734

Answers (3)

aioobe
aioobe

Reputation: 421220

You need to make the part that matches the file name reluctant, i.e. change (.+) to (.+?):

String rgx = "^(.+?)(\\.tar)?\\.gz";
//              ^^^

Now you get:

Matcher m = Pattern.compile(rgx).matcher(path);           
if(m.find()){
   System.out.println(m.group(1));   //   /path/to/file.txt
}

Upvotes: 3

Mena
Mena

Reputation: 48444

You can use the following idiom to match both your path+file name, an gzip extensions in one go:

String[] inputs = {
        "/path/to/foo.txt.tar.gz", 
        "/path/to/bar.txt.gz",
        "/path/to/nope.txt"
 };
//                           ┌ group 1: any character reluctantly quantified
//                           |    ┌ group 2
//                           |    | ┌ optional ".tar"
//                           |    | |       ┌ compulsory ".gz"
//                           |    | |       |     ┌ end of input
Pattern p = Pattern.compile("(.+?)((\\.tar)?\\.gz)$");
for (String s: inputs) {
    Matcher m = p.matcher(s);
    if (m.find()) {
        System.out.printf("Found: %s --> %s %n", m.group(1), m.group(2));
    }
}

Output

Found: /path/to/foo.txt --> .tar.gz 
Found: /path/to/bar.txt --> .gz 

Upvotes: 4

Avinash Raj
Avinash Raj

Reputation: 174826

Use a capturing group based regex.

^(.+)/(.+)(?:\\.tar)?\\.gz$

And,

Get the path from index 1.

Get the filename from index 2.

DEMO

Upvotes: 1

Related Questions