Reputation:
I'm wondering how I could extract '4151' from the following code:
</th><td><a class="external exitstitial" rel="nofollow" href="http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151">Look up price</a>
I would like to use regex but if there is a better way I'm open for it!
Upvotes: 2
Views: 273
Reputation: 6567
The following works for me, assuming the href
attribute value was already extracted:
String href = "http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151";
Pattern p = Pattern.compile("\\?obj=(\\d+)");
Matcher m = p.matcher(href);
if (m.find()) {
System.out.println(m.group(1));
}
Outputs "4151"
Upvotes: 4
Reputation: 3984
Here are a few parser libraries : htmlparser, jsoup, and jtidy.
In your case, regex may be fine, but here's a classic post of why you should avoid regex for html parsing.
Upvotes: 3
Reputation: 24236
This regex would get you the number -
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
This code is not tested and presumes your HTML string is assigned to the 'subjectString' variable.
Upvotes: 0