Mary
Mary

Reputation: 1595

Regular expression to handle two different file extensions

I am trying to create a regular expression that takes a file of name "abcd_04-04-2020.txt" or "abcd_04-04-2020.txt.gz"

How can I handle the "OR" condition for the extension. This is what I have so far

if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3})")){
    Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}

This handles only the .txt. How can I handle ".txt.gz" Thanks

Upvotes: 0

Views: 1190

Answers (6)

user7571182
user7571182

Reputation:

You can use the below regex to achieve your purpose:

^[\w-]+\d{2}-\d{2}-\d{4}\.txt(?:\.gz)?$

Explanation of the above regex:]

^,$ - Matches start and end of the test string resp.

[\w-]+ - Matches word character along with hyphen one or more times.

\d{} - Matches digits as many numbers as mentioned in the curly braces.

(?:\.gz)? - Represents non-capturing group matching .gz zero or one time because of ? quantifier. You could have used | alternation( or as you were expecting OR) but this is legible and more efficient too.

You can find the demo of the above regex here.

Regular expression visualization

IMPLEMENTATION IN JAVA:

import java.util.regex.*;
public class Main
{
    private static final Pattern pattern = Pattern.compile("^[\\w-]+\\d{2}-\\d{2}-\\d{4}\\.txt(?:\\.gz)?$", Pattern.MULTILINE);
    public static void main(String[] args) {
        String testString = "abcd_04-04-2020.txt\nabcd_04-04-2020.txt.gz\nsomethibsnfkns_05-06-2020.txt\n.txt.gz";
        Matcher matcher = pattern.matcher(testString);
        while(matcher.find()){
            System.out.println(matcher.group(0));
        }
    }
}

You can find the implementation of the above regex in java in here.

NOTE: If you want to match for valid dates also; please visit this.

Upvotes: 2

Charlie Armstrong
Charlie Armstrong

Reputation: 2342

I think what you want (following from the direction you were going) is this:

[\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.[a-zA-Z]{3}(?:$|\\.[a-zA-Z]{2}$)

At the end, I have a conditional statement. It has to either match the end of the string ($) OR it has to match a literal dot followed by 2 letters (\\.[a-zA-Z]{2}). Remember to escape the ., because in regex . means "match any character".

Upvotes: 1

rph
rph

Reputation: 2659

A possible way of doing it:

Pattern pattern = Pattern.compile("^[\\w._-]+_\\d{2}-\\d{2}-\\d{4}(\\.txt(\\.gz)?)$");

Then you can run the following test:

String[] fileNames = {
        "abcd_04-04-2020.txt",
        "abcd_04-04-2020.tar",
        "abcd_04-04-2020.txt.gz",
        "abcd_04-04-2020.png",
        ".txt",
        ".txt.gz",
        "04-04-2020.txt"
};

Arrays.stream(fileNames)
        .filter(fileName -> pattern.matcher(fileName).find())
        .forEach(System.out::println);

// output
// abcd_04-04-2020.txt
// abcd_04-04-2020.txt.gz

Upvotes: 1

Arun
Arun

Reputation: 915

? will work for your required | . Try adding

(.[a-zA-Z]{2})?

to your original regex

([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)

Upvotes: 1

Balastrong
Balastrong

Reputation: 4474

You can replace .[a-zA-Z]{3} with .txt(\.gz)

if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){
   Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}

Upvotes: 1

QuickSilver
QuickSilver

Reputation: 4045

Why not just use endsWith instead complex regex

if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){
 Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}

Upvotes: 2

Related Questions