Reputation: 470
I'm working on a blog / news aggregator, and need some help with regex parsing, I think :P
I need to be able to find what's after the // and the first / in the <link>
~ so that I can display the source URL properly. How do I do that?
<link>http://www.arabdemocracy.com/2012/09/syria-enter-worst-case-scenario.html</link>
Upvotes: 0
Views: 37
Reputation: 72616
With the following pattern you can achieve what you need (at list for the input string you given) :
<(\w+?)>[\w\W]+?//([\w\.]+?)/[\w\W]+?</\1>
To get the part you need, see the contents of the second capture group ...
Anyway keep in mind that regex are not the best bet to parse HTML ... Look at a HTML DOM parser library if you can .
Upvotes: 1