Gabriele
Gabriele

Reputation: 1183

Massive Mutithreading Operations

EDITED WITH NEW CODE BELOW

I'm relatively newbie on Multithreading but to achieve my goal, doing it quickly and learning something new, I decided to do it using a multithread App.

The goal: Parse a huge amount of string from a file and save every word into the SQLite db using CoreData. Huge because the amount of words is around 300.000 ...

So this is my approach.

Step 1. Parse all the words into the file placing it into a huge NSArray. (Done quickly)

Step 2. Create the NSOperationQueue inserting the NSBlockOperation.

The main problem is that the process start very quickly but than slow down very soon. I'm Using an NSOperationQueue with max concurrent operation setted to 100. I have a Core 2 Duo Process (Dual core without HT).

I seen that using NSOperationQueue there is a lot of overhead creating the NSOperation (stopping the dispatch of the queue it need about 3 min just to create 300k NSOperation.) CPU goes to 170% when I start dispatching the queue.

I tryed also removing the NSOperationQueue and using the GDC (the 300k loop is done instantaneous (commented lines)) but cpu used is only 95% and the problem is the same as with NSOperations. Very soon the process slow down.

Some tips to do it well?

Here some Code (Original question Code):

- (void)inserdWords:(NSArray *)words insideDictionary:(Dictionary *)dictionary {
    NSDate *creationDate = [NSDate date];

    __block NSUInteger counter = 0;

    NSArray *dictionaryWords = [dictionary.words allObjects];
    NSMutableSet *coreDataWords = [NSMutableSet setWithCapacity:words.count];

    NSLog(@"Begin Adding Operations");

    for (NSString *aWord in words) {

        void(^wordParsingBlock)(void) = ^(void) {
            @synchronized(dictionary) {
                NSManagedObjectContext *context = [(PRDGAppDelegate*)[[NSApplication sharedApplication] delegate] managedObjectContext];                

                [context lock];

                Word *toSaveWord = [NSEntityDescription insertNewObjectForEntityForName:@"Word" inManagedObjectContext:context];
                [toSaveWord setCreated:creationDate];
                [toSaveWord setText:aWord];
                [toSaveWord addDictionariesObject:dictionary];

                [coreDataWords addObject:toSaveWord];
                [dictionary addWordsObject:toSaveWord];

                [context unlock];

                counter++;
                [self.countLabel performSelectorOnMainThread:@selector(setStringValue:) withObject:[NSString stringWithFormat:@"%lu/%lu", counter, words.count] waitUntilDone:NO];

            }
        };

        [_operationsQueue addOperationWithBlock:wordParsingBlock];
//        dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
//        dispatch_async(queue, wordParsingBlock);
    }
    NSLog(@"Operations Added");
}

Thank you in advance.

Edit...

Thanks to Stephen Darlington I rewrite my code and I figured out the problem. The most important thing is: Do not share CoreData object between Thread ... it means do not mix Core data objects retrieved by different context.

This bring me to use @synchronized(dictionary) that result in a slow motion code execution! Than I removed the massive NSOperation creation using just MAXTHREAD instance. (2 or 4 instead of 300k ... is a huge difference)

Now I can parse 300k+ String in just 30/40 seconds. Impressive!! Still I have some issue (seams it parse more words than they are with just 1 thread and it parse not all the words if threads are more than 1 ... I need to figure it out) but now the code is really efficient. Maybe the next step could be using OpenCL and injecting it into the GPU :)

Here the new Code

- (void)insertWords:(NSArray *)words forLanguage:(NSString *)language {
    NSDate *creationDate = [NSDate date];
    NSPersistentStoreCoordinator *coordinator = [(PRDGAppDelegate*)[[NSApplication sharedApplication] delegate] persistentStoreCoordinator];

    // The number of words to be parsed by the single thread.
    NSUInteger wordsPerThread = (NSUInteger)ceil((double)words.count / (double)MAXTHREADS);

    NSLog(@"Start Adding Operations");
    // Here I minimized the number of threads. Every thread will parse and convert a finite number of words instead of 1 word per thread.
    for (NSUInteger threadIdx = 0; threadIdx < MAXTHREADS; threadIdx++) {

        // The NSBlockOperation.
        void(^threadBlock)(void) = ^(void) {
            // A new Context for the current thread.
            NSManagedObjectContext *context = [[NSManagedObjectContext alloc] init];
            [context setPersistentStoreCoordinator:coordinator];

            // Dictionary now is in accordance with the thread context.
            Dictionary *dictionary = [PRDGMainController dictionaryForLanguage:language usingContext:context];

            // Stat Variable. Needed to update the UI.
            NSTimeInterval beginInterval = [[NSDate date] timeIntervalSince1970];
            NSUInteger operationPerInterval = 0;

            // The NSOperation Core. It create a CoreDataWord.
            for (NSUInteger wordIdx = 0; wordIdx < wordsPerThread && wordsPerThread * threadIdx + wordIdx < words.count; wordIdx++) {
                // The String to convert
                NSString *aWord = [words objectAtIndex:wordsPerThread * threadIdx + wordIdx];

                // Some Exceptions to skip certain words.
                if (...) {
                    continue;
                }

                // CoreData Conversion.
                Word *toSaveWord = [NSEntityDescription insertNewObjectForEntityForName:@"Word" inManagedObjectContext:context];
                [toSaveWord setCreated:creationDate];
                [toSaveWord setText:aWord];
                [toSaveWord addDictionariesObject:dictionary];

                operationPerInterval++;

                NSTimeInterval endInterval = [[NSDate date] timeIntervalSince1970];

                // Update case.
                if (endInterval - beginInterval > UPDATE_INTERVAL) {

                    NSLog(@"Thread %lu Processed %lu words", threadIdx, wordIdx);

                    // UI Update. It will be updated only by the first queue.
                    if (threadIdx == 0) {

                        // UI Update code.
                    }
                    beginInterval = endInterval;
                    operationPerInterval = 0;
                }
            }

            // When the NSOperation goes to finish the CoreData thread context is saved.
            [context save:nil];
            NSLog(@"Operation %lu finished", threadIdx);
        };

        // Add the NSBlockOperation to queue.
        [_operationsQueue addOperationWithBlock:threadBlock];
    }
    NSLog(@"Operations Added");
}

Upvotes: 3

Views: 345

Answers (2)

Stephen Darlington
Stephen Darlington

Reputation: 52565

A few thoughts:

  • Setting max concurrent operations so high is not going to have much effect. It's unlikely to be more than two if you have two cores
  • It looks as though you're using the same NSManagedObjectContext for all your processes. This is Not Good
  • Let's assume that your max concurrent operations was 100. The bottle-neck would be the main thread, where you're trying to update a label for every operation. Try to update the main thread for every n records instead of every one
  • You shouldn't need to lock the context if you're using Core Data correctly... which means using a different context for each thread
  • You don't seem to ever save the context?
  • Batching operations is a good way to improve performance... but see previous point
  • As you suggest, there's an overhead in creating a GCD operation. Creating a new one for each word is probably not optimal. You need to balance the overhead of creating a new processes with the benefits of parallelisation

In short, threading is hard, even when you use something like GCD.

Upvotes: 2

bryanmac
bryanmac

Reputation: 39296

It's hard to way without measuring and profiling but what looks suspicious to me is your saving the full dictionary of words that have been saved so far with the save of each word. So the amount of data per save gets successively larger and larger.

// the dictionary at this point contains all words saved so far
// which each contains a full dictionary
[toSaveWord addDictionariesObject:dictionary];

// add each time so it gets bigger each time
[dictionary addWordsObject:toSaveWord];

So, each save is saving more and more data. Why save a dictionary of all words with each word?

Some other thoughts:

  • why build up coreDataWords that you never use?
  • I wonder if you're getting the concurrency you're since you're synchronizing the full block of work.

Things to try:

  • comment out the dictionary on the toSaveWord in addition to the dictionary you're building up and try again - see if it's your data/data structures or DB/coreData.
  • Do the first but also create a serial version of it to see if you're actually getting concurency benefits.

Upvotes: 0

Related Questions