user882347
user882347

Reputation:

Using regex to get information inside an HTML tag

I'm wondering how I could extract '4151' from the following code:

</th><td><a class="external exitstitial" rel="nofollow" href="http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151">Look up price</a>

I would like to use regex but if there is a better way I'm open for it!

Upvotes: 2

Views: 273

Answers (3)

Alistair A. Israel
Alistair A. Israel

Reputation: 6567

The following works for me, assuming the href attribute value was already extracted:

String href = "http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151";
Pattern p = Pattern.compile("\\?obj=(\\d+)");
Matcher m = p.matcher(href);
if (m.find()) {
    System.out.println(m.group(1));
}

Outputs "4151"

Upvotes: 4

asgs
asgs

Reputation: 3984

Here are a few parser libraries : htmlparser, jsoup, and jtidy.

In your case, regex may be fine, but here's a classic post of why you should avoid regex for html parsing.

Upvotes: 3

ipr101
ipr101

Reputation: 24236

This regex would get you the number -

Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group();
} 

This code is not tested and presumes your HTML string is assigned to the 'subjectString' variable.

Upvotes: 0

Related Questions