Zen_silence
Zen_silence

Reputation: 329

Using regexlite to parse <a href src="">Links</a> out of a NSString

I am writing an iPhone app that has to pull raw HTML data off a website an grab the url of the links and the displayed text of a link.

For example in the like <a href="www.google.com">Click here to go to google</a>

It would pull grab url = www.google.com text = Click Here to go to google

I'm using the regexlite library but i'm in no way an expert on regular expressions i have tried several things to get this working.

I want to use the following code

NSString *searchString  = @"$10.23, $1024.42, $3099";
NSString *regexString   = @"\\$((\\d+)(?:\\.(\\d+)|\\.?))";
NSArray  *capturesArray = NULL;

capturesArray = [searchString arrayOfCaptureComponentsMatchedByRegex:regexString];

So my question is can someone tell me what the searchString would be to parse html links or point me to a clear tutorial on how regexlite works i have tired reading the documentation at http://regexkit.sourceforge.net/RegexKitLite/ and i dont understand it.

Thanks in advance,

Zen_silence

Upvotes: 0

Views: 1098

Answers (3)

Zen_silence
Zen_silence

Reputation: 329

In case anyone else has this same question the regex string to match an html link is

NSString *regexString = @"<a href=([^>]*)>([^>]*) - ";

The Oreilly book "Mastering Regular Expressions" helped me figure this out really quickly i highly recommend reading if you are trying to use regular expressions.

Upvotes: 0

searchString would be the whole raw HTML text, and regexString should be more like:

NSString *regexString = @"href=\"(.*)\">(.*)<";

Then you would use capturing matches to pull out match1 and match2, repeating the match through the HTML text using the Range option for searching so that you would skip past what you had already searched...

I don't know what you are trying to do with searchString and the numbers though.

Upvotes: 0

bbum
bbum

Reputation: 162712

In short, don't do that. Regular expressions are a horrible way to parse HTML. HTML documents are highly structured with a hierarchy of tags whose contents may span lines without said lines appearing in the rendered form.

Assuming well structured HTML, you can use an XML parser.

In particular, the iPhone offers the NSXMLParser and some good examples of usage therein.

Upvotes: 4

Related Questions