J Freebird
J Freebird

Reputation: 3910

Java regular expression to match patterns in url

I have a bunch of urls that share the following pattern:

http://www.ebay.com/itm/Crosman-Pumpmaster-760-Pump-177-Pellet-4-5-mm-BB-Air-Rifle-Black-760B-/251635693266?pt=LH_DefaultDomain_0&hash=item3a96a7f6d2

I want to extract item3a96a7f6d2. The http://www.ebay.com/itm/ and &hash= are fixed patterns while the string in between can change. I wrote:

                String prodPatternString = "(http://www.ebay.com/itm/)(.*?)(hash=)(.*?)";
                Pattern prodPattern = Pattern.compile(prodPatternString);
                Matcher prodMatcher = prodPattern.matcher(prodUrl);
                while(prodMatcher.find()){
                    String pid = matcher.group(4);
                }

But it gives me an error saying "No match found". Any help will be greatly appreciated. Thanks.

Upvotes: 0

Views: 291

Answers (3)

Avinash Raj
Avinash Raj

Reputation: 174696

You need to change matcher.group(4); line to prodMatcher.group(4); and then remove the ? present inside the last capturing group because .*? will do a non-greedy match of zero or more characters, so it would match also an empty string even though characters present since it's in non-greedy form.

String prodUrl = "http://www.ebay.com/itm/Crosman-Pumpmaster-760-Pump-177-Pellet-4-5-mm-BB-Air-Rifle-Black-760B-/251635693266?pt=LH_DefaultDomain_0&hash=item3a96a7f6d2";
String prodPatternString = "(http://www.ebay.com/itm/)(.*?)(hash=)(.*)";
Pattern prodPattern = Pattern.compile(prodPatternString);
Matcher prodMatcher = prodPattern.matcher(prodUrl);
while(prodMatcher.find()){
        String pid = prodMatcher.group(4);
        System.out.println(pid);
}

Output:

item3a96a7f6d2

Upvotes: 1

anubhava
anubhava

Reputation: 784898

You can use this regex:

(http://www.ebay.com/itm/)(.*?)(hash=)([^&]*)

RegEx Demo

.*? is matching too little in the 4th capturing group in your regex.

Upvotes: 0

IByrd
IByrd

Reputation: 177

You should check out the lastindexof method. Then you can substring the url starting at the last index of '&hash=' and ending at the full length of the string. This will get the item=x

Upvotes: 0

Related Questions