agierens
agierens

Reputation: 132

Getting certain parts of strings

If I have a string that returns a value of :

<div style="clear:both;"></div>
                <div style="float:left;">
                    <div style="float:left; height:27px; font-size:13px; padding-top:2px;">
                        <div style="float:left;"><a href="http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3" rel="nofollow" target="_blank" style="color:green;">Download</a></div>

How can I just get the <a href="http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3" part out of it? I apologise if there is posts about this already, I couldn't find any.

Upvotes: 0

Views: 216

Answers (2)

nebs
nebs

Reputation: 4989

Here's an example of using regular expressions to find substrings. It looks for "href=" and then for the first quote (") after href=. Once these indexes are found, the string between then is returned.

Regular expressions aren't really needed in my example, you could use simple NSString methods to find substrings instead.

This is just a hard coded example that fits your specific case. In practice you're better off using a DOM/XML parser to do something like this.

Also I'm assuming you want to extract the actual URL and don't care about the

Also note this function doesn't handle the case that there is no href match in the string.

- (NSString *)stringByExtractingAnchorTagURLFromString:(NSString *)dom {
    NSError *error;

    // Find the "href=" part
    NSRegularExpression *firstRegexp = [NSRegularExpression regularExpressionWithPattern:@"href=\"" options:NSRegularExpressionCaseInsensitive error:&error];
    NSTextCheckingResult *firstResult = [firstRegexp firstMatchInString:dom options:NSMatchingReportProgress range:NSMakeRange(0, [dom length])];

    NSUInteger startIndex = firstResult.range.location + firstResult.range.length;

    // Find the first quote (") character after the href=
    NSRegularExpression *secondRegexp = [NSRegularExpression regularExpressionWithPattern:@"\"" options:NSRegularExpressionCaseInsensitive error:&error];
    NSTextCheckingResult *secondResult = [secondRegexp firstMatchInString:dom options:NSMatchingReportProgress range:NSMakeRange(startIndex, [dom length]-startIndex)];

    NSUInteger endIndex = secondResult.range.location;

    // The URL is the string between these two found locations
    return [dom substringWithRange:NSMakeRange(startIndex, endIndex-startIndex)];
}

This is how I tested it:

NSString *dom = @"<div style=\"clear:both;\"></div><div style=\"float:left;\"><div style=\"float:left; height:27px; font-size:13px; padding-top:2px;\"><div style=\"float:left;\"><a href=\"http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3\" rel=\"nofollow\" target=\"_blank\" style=\"color:green;\">Download</a></div>";
NSString *result = [self stringByExtractingAnchorTagURLFromString:dom];
NSLog(@"Result: %@", result);

The test prints:

Result: http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3

UPDATE -- Multiple HREFs

For multiple hrefs use this function, which will return an array of NSStrings holding the urls:

- (NSArray *)anchorTagURLsFromString:(NSString *)dom {
    NSError *error;
    NSMutableArray *urls = [NSMutableArray array];

    // First find all matching hrefs in the dom
    NSRegularExpression *firstRegexp = [NSRegularExpression regularExpressionWithPattern:@"href=\"" options:NSRegularExpressionCaseInsensitive error:&error];
    NSArray *matches = [firstRegexp matchesInString:dom options:NSMatchingReportProgress range:NSMakeRange(0, [dom length])];

    // Go through all matches and extrac the URL
    for (NSTextCheckingResult *match in matches) {
        NSUInteger startIndex = match.range.location + match.range.length;

        // Find the first quote (") character after the href=
        NSRegularExpression *secondRegexp = [NSRegularExpression regularExpressionWithPattern:@"\"" options:NSRegularExpressionCaseInsensitive error:&error];
        NSTextCheckingResult *secondResult = [secondRegexp firstMatchInString:dom options:NSMatchingReportProgress range:NSMakeRange(startIndex, [dom length]-startIndex)];

        NSUInteger endIndex = secondResult.range.location;

        [urls addObject:[dom substringWithRange:NSMakeRange(startIndex, endIndex-startIndex)]];
    }

    return urls;
}

This is how I tested it:

NSString *dom2 = @"<div style=\"clear:both;\"></div><div style=\"float:left;\"><div style=\"float:left; height:27px; font-size:13px; padding-top:2px;\"><div style=\"float:left;\"><a href=\"http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3\" rel=\"nofollow\" target=\"_blank\" style=\"color:green;\">Download</a><a href=\"http://www.google.com/blabla\" rel=\"nofollow\" target=\"_blank\" style=\"color:green;\">Download</a></div>";
NSArray *urls = [self anchorTagURLsFromString:dom2];
for (NSString *url in urls) {
    NSLog(@"URL: %@", url);
}

This is the output of the test:

URL: http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3
URL: http://www.google.com/blabla

Upvotes: 1

Related Questions