JoeS
JoeS

Reputation: 231

Most memory efficient way to split an NSString in to substrings

I have the following code:

    int start = [html rangeOfString:@"class=WordSection1>"].location + 24;
    int end = [html rangeOfString:@"<div class=\"endofsections\">"].location;
    self.parts = [[NSMutableArray alloc] init];

    NSString* startHtml = [html substringToIndex:start - 1];
    NSString* mainHtml = [html substringWithRange:NSMakeRange(start - 1, end - start - 1)];
    NSString* endHtml = [html substringFromIndex:end];
    // !! At this point we have the string in memory twice
    [html release];

    [self.parts addObject: startHtml];

    NSArray *splitHtml = [mainHtml componentsSeparatedByString:@"<p class=NumberedParagraph>"];
    //[mainHtml release]; <-- this causes bad access errors. Does the split do a copy or does it just create a new set of pointers but use the same memory?

    for(NSString* part in splitHtml){
        if (first){
            [self.parts addObject: part];
            first = NO;
        } else {
            [self.parts addObject: [NSString stringWithFormat:@"<p class=NumberedParagraph>%@", part]];
        }
     }

    [self.parts addObject:endHtml];

The issue with this is that html is about 20Mb. I split it in to startHtml, mainHtml and endHtml. After splitting it I then release html. However prior to this release all 4 NSStrings are in memory so the app is using an extra 40Mb or so.

I then split mainHtml and assign the substrings to an NSArray called splitHtml, this again means that they are stored in memory twice. I try to release mainHtml but this causes an EXC_BAD_ACCESS error.

Is there any way to get around this object being stored in memory twice before being released issue?

I plan to replace the for loop with a while loop that removes the processed NSStrings from splitHtml. The loop condition will be satisfied when splitHtml is empty. This is so that as the parts array consumes more memory the splitHtml array consumes less memory. Do I need to release each NSString or can I just remove it and have the array consume less memory as a whole?

Thanks,

Joe

Upvotes: 2

Views: 382

Answers (2)

bbum
bbum

Reputation: 162712

Parsing HTML using rangeOfString:, NSScanner or regular expressions is futile. It might work for your test case, but it is going to break as soon as the HTML changes.

I.e. keep in mind that:

<div class=\"endofsections\">

And:

<div    class=\"endofsections\"   id=1 
    title="End Of Sections"  >

Are both identical in terms of the class attribute.

Use a proper HTML parser.

Upvotes: 2

Bastian
Bastian

Reputation: 10433

Well.. you can't release mainHtml because it is created as an autorelease object, so release will get called after your function is done and it will crash if the object is already released by then.

You could try to create an extra function that splits the string and returns the array, perhaps with an own autorelease pool that you release after the function is run to make sure the strings are released.

Upvotes: 1

Related Questions