Reputation: 14849
I have this HTML:
<ul><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&cid=6-1&activity=userdata&levelFirstItem=0">Zugangsdaten</a></li><li><a href="/web3/setBookingTemplate.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&cid=6-1&activity=template&levelFirstItem=1">Buchungsvorlagen</a></li><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&cid=6-1&activity=showFavorites&levelFirstItem=2">Hotelfavoriten</a></li><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&cid=6-1&activity=showLightHistory&levelFirstItem=3">Buchungshistorie</a></li><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&cid=6-1&activity=showHotelRating&levelFirstItem=4">Hotelbewertung</a></li></ul>
How can I extract any HREF ending in levelFirstItem=2
? Example:
/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&cid=6-1&activity=showFavorites&levelFirstItem=2
Upvotes: 0
Views: 249
Reputation: 40877
or possibly /href="(.*?)"/
assuming the regexp engine you're using negates greedy with ?
.
Upvotes: 0
Reputation: 43815
This will capture everything within the quotes for only levelFirstItem=2:
/href="([^"]*levelFirstItem=2)"/
Upvotes: 3
Reputation: 34711
/href="([^"]*)"/
and in Java:
Pattern p = Pattern.compile("href=\"([^\"]*)\"");
Matcher m = p.matcher(line);
if(m.matches()) {
String href = m.group(1);
}
Upvotes: 0
Reputation: 1911
In general, it's better to find an HTML library that will allow you to grab information from HTML. Using regular expressions will get very messy quickly.
What language are you using? I'm sure people here can direct you to a good HTML parsing library for any popular language.
Upvotes: 3