Carlos Goce
Carlos Goce

Reputation: 1665

Java regular expression: extract filename not working

I have a list of URL like this one: http://mysite.es/img/p/3/6/2/5/6/36256.jpg

I need to replace the 36256 part.

My regular expression looks like this:

Boolean find = Pattern.matches("^[http://].+/(?<number>\\d+)[\\.jpg]$", url);

But it's always returning false. What i am doing wrong? I think the problem is that there are many "/" on the URL but not all the URLs have the same number of /.

This works:

Boolean find = Pattern.matches("[\\.jpg]$", url);

This doens't works:

Boolean find = Pattern.matches("/(\\d+)[\\.jpg]$", url);

I can't figure out why.

Thanks in advance

Upvotes: 1

Views: 1250

Answers (2)

Reimeus
Reimeus

Reputation: 159854

Assuming you mean

Boolean find = Pattern.matches(".*[\\.jpg]$", url);

and

Boolean find = Pattern.matches(".*/(\\d+)[\\.jpg]$", url);

The first pattern matches as it only needs to match any of the characters .jpg before the end. The second doesnt match due to fact that it requires a preceding digit before the characters specified in the character class positioned at the end of the URL String.

You need to remove the use of the character class.

Boolean find = Pattern.matches(".*/(\\d+)\\.jpg$", url);

Upvotes: 3

Mena
Mena

Reputation: 48444

First, if your URLs all have "/" characters and a file type extension, you probably don't need regex for this.

For instance:

String url = "http://mysite.es/img/p/3/6/2/5/6/36256.jpg";
String toReplace = url.substring(url.lastIndexOf("/") + 1, url.lastIndexOf("."));
System.out.println(toReplace);
String replacedURL = url.replace(toReplace, "foo");
System.out.println(replacedURL);

Edit

// solution with regex
Pattern fileName = Pattern.compile(".+(?<!/)/(?!/)(.+?)\\..+?");
Matcher matcher = fileName.matcher(url);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    String replacedURLWithRegex = url.replace(matcher.group(1), "blah");
    System.out.println(replacedURLWithRegex);
}

Output:

36256
http://mysite.es/img/p/3/6/2/5/6/foo.jpg

Output for edit:

36256
http://mysite.es/img/p/3/6/2/5/6/blah.jpg

About what's wrong in your regex, "[\.jpg]" will attempt to match any character within the class defined by the square brackets, that is "." or "j" or "p" or "g", not ".jpg" in a sequence. For sequential matching you don't use square brackets (although you can use round brackets for grouping sequential matches).

Upvotes: 3

Related Questions