webo80
webo80

Reputation: 3393

Performance improvement in Core Data relationship

I have two core data entities (which have relationship, and its inverse), pre-populated (around 50k registers on each), and I need to make a relationship. It's almost a 1:1 relation. They have an attribute in common, so they must be in a relationship if both attributes are equal.

I'm trying to do it in a rough way, and getting a lot of memory issues (it quickly escalates to memory warnings).

@autoreleasepool {        
    NSFetchRequest *e2sRequest = [[NSFetchRequest alloc] initWithEntityName:@"Entity2"];
    e2sRequest.includesPropertyValues = NO;
    e2sRequest.includesSubentities = NO;
    NSArray *e2s = [self.fatherMOC executeFetchRequest:e2sRequest error:nil];

    if(e2s.count > 0) {
        NSFetchRequest *e1sRequest = [[NSFetchRequest alloc] initWithEntityName:@"Entity1"];
        e1sRequest.includesPropertyValues = NO;
        e1sRequest.includesSubentities = NO;
        NSArray *e1s = [self.fatherMOC executeFetchRequest:e1sRequest error:nil];

        for(Entity1 *e1 in e1s) {
            NSString *attributeInCommon = e1.attributeInCommon;
            NSPredicate *predicate = [NSPredicate predicateWithFormat:@"attributeInCommon = %@", attributeInCommon];
            Entity2 *e2matching = (Entity2 *)[e2s filteredArrayUsingPredicate:predicate].lastObject;
            if(e2) {
                e1.e2 = e2matching;
            }
        }
    }
}

I've tried getting the attribute in common and the objectID in memory in a NSDictionary, with no result. I've tried a couple more of methods, ones being terribly slow, and others being terrible memory-eaters.

I know that I must check the errors, I know I can do it in less lines of code, but think of it as a debug/on a rush code, so I'll be fixed.

Thanks in advance

Upvotes: 1

Views: 267

Answers (2)

Mundi
Mundi

Reputation: 80265

I suppose that this operation (match 50.000 entities with 50.000 other entities based on a common string attribute that acts as a unique key) is not something you want to repeat on users' devices. Rather, it seems you need to do it once in preparation of the seed data.

Therefore there is actually no need to optimize, because time and (on the simulator) memory are won't be the issue.

So just perform this in batches, e.g. as follows:

  • fetch 1000 e1,
  • fetch 1000 corresponding e2 with a predicate
  • link
  • save
  • drain memory
  • repeat

Some hints:

To get to the distinct 1.000 records chunks, add a sort descriptor and use fetchOffset and fetchLimit.

The predicate for getting the records would be something like this.

NSArray *attributes = [e1Results valueForKeyPath:@"attributeInCommon"];
request.predicate = 
    [NSPredicate predicateWithFormat:@"attributeInCommon in @%", attributes];

Upvotes: 1

Wain
Wain

Reputation: 119031

You're trying to load 100000 items all at the same time so it's no wonder you have memory issues.

You need to batch and if you create an autorelease pool you need to drain it sometimes (so it needs to be involved with the batch).

So, set a fetchBatchSize on the first fetch request. Then, iterate over the results it finds taking fetchBatchSize items at a time. This is where the pool should be so it's released after each batch. Start with a batch of 100 and see how it goes.

Each batch then makes the second query with a predicate to limit to the set of values that can actually match with the current batch.

Then run your matching logic.

Consider also using the Core Data tool in Instruments to check what's happening, how many requests you make to the data store and how long it all takes.

Upvotes: 1

Related Questions