Kyle
Kyle

Reputation: 3042

Problem finding an url within html--with regex?

            for (String line; (line = reader.readLine()) != null;) {//reads html page
                Pattern p = Pattern.compile("https://secure\\.runescape\\.com/m=displaynames/s=[a-zA-Z1-9*]+/check_name\\.ws\\?displayname=");
                Matcher m = p.matcher(line);
                if (m.find()) {
                    System.out.println(m.group(0));
                }

            }

The string in the page looks like: callback_request("https://secure.runescape.com/m=displaynames/s=p2FAuYaMFDgzntbDei*324JUo*3ozJ7hR*h1KNlxc6kPaBeKCBrdKH5kzljYSfUa/check_name.ws?displayname=" + escape(text), handleResult);

However it's not returning any results. Am I doing something wrong? Apologies for the noobish question, I'm still learning java.

Upvotes: 0

Views: 120

Answers (3)

Rom1
Rom1

Reputation: 3207

You could use a regex tester for debugging, for instance here. A better expression is probably https://secure\.runescape\.com/m=displaynames/s=[a-zA-Z1-9*]+/check_name\.ws\?displayname=

Upvotes: 1

Favonius
Favonius

Reputation: 13984

As per your regular expression you are missing a ? in the test expression.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {
    public static void main(String[] args) 
    {
        Pattern p = Pattern.compile("https://secure\\.runescape\\.com/m=displaynames/.*/check_name\\.ws\\?displayname=(\\?)?");
        Matcher m = p.matcher("callback_request(\"https://secure.runescape.com/m=displaynames/s=p2FAuYaMFDgzntbDei*324JUo*3ozJ7hR*h1KNlxc6kPaBeKCBrdKH5kzljYSfUa/check_name.ws?displayname=\" + escape(text), handleResult);");
        if(m.find())
        {
            System.out.println(m.group(0));
        }
    }
}

I suppose in the displayname=? the ending ? is coming from the escape(text) therefore if you make the ? in the displayname=? as optional then it would work. Check the above code for more detail.

>>Output: https://secure.runescape.com/m=displaynames/s=p2FAuYaMFDgzntbDei*324JUo*3ozJ7hR*h1KNlxc6kPaBeKCBrdKH5kzljYSfUa/check_name.ws?displayname=

Upvotes: 2

Tim
Tim

Reputation: 1286

It looks like your regex is being matched on one line at a time. Are you sure that the URL you are searching for will always be on one line?

Upvotes: 1

Related Questions