Reputation: 1183
EDITED WITH NEW CODE BELOW
I'm relatively newbie on Multithreading but to achieve my goal, doing it quickly and learning something new, I decided to do it using a multithread App.
The goal: Parse a huge amount of string from a file and save every word into the SQLite db using CoreData. Huge because the amount of words is around 300.000 ...
So this is my approach.
Step 1. Parse all the words into the file placing it into a huge NSArray. (Done quickly)
Step 2. Create the NSOperationQueue inserting the NSBlockOperation.
The main problem is that the process start very quickly but than slow down very soon. I'm Using an NSOperationQueue with max concurrent operation setted to 100. I have a Core 2 Duo Process (Dual core without HT).
I seen that using NSOperationQueue there is a lot of overhead creating the NSOperation (stopping the dispatch of the queue it need about 3 min just to create 300k NSOperation.) CPU goes to 170% when I start dispatching the queue.
I tryed also removing the NSOperationQueue and using the GDC (the 300k loop is done instantaneous (commented lines)) but cpu used is only 95% and the problem is the same as with NSOperations. Very soon the process slow down.
Some tips to do it well?
Here some Code (Original question Code):
- (void)inserdWords:(NSArray *)words insideDictionary:(Dictionary *)dictionary {
NSDate *creationDate = [NSDate date];
__block NSUInteger counter = 0;
NSArray *dictionaryWords = [dictionary.words allObjects];
NSMutableSet *coreDataWords = [NSMutableSet setWithCapacity:words.count];
NSLog(@"Begin Adding Operations");
for (NSString *aWord in words) {
void(^wordParsingBlock)(void) = ^(void) {
@synchronized(dictionary) {
NSManagedObjectContext *context = [(PRDGAppDelegate*)[[NSApplication sharedApplication] delegate] managedObjectContext];
[context lock];
Word *toSaveWord = [NSEntityDescription insertNewObjectForEntityForName:@"Word" inManagedObjectContext:context];
[toSaveWord setCreated:creationDate];
[toSaveWord setText:aWord];
[toSaveWord addDictionariesObject:dictionary];
[coreDataWords addObject:toSaveWord];
[dictionary addWordsObject:toSaveWord];
[context unlock];
counter++;
[self.countLabel performSelectorOnMainThread:@selector(setStringValue:) withObject:[NSString stringWithFormat:@"%lu/%lu", counter, words.count] waitUntilDone:NO];
}
};
[_operationsQueue addOperationWithBlock:wordParsingBlock];
// dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
// dispatch_async(queue, wordParsingBlock);
}
NSLog(@"Operations Added");
}
Thank you in advance.
Edit...
Thanks to Stephen Darlington I rewrite my code and I figured out the problem. The most important thing is: Do not share CoreData object between Thread ... it means do not mix Core data objects retrieved by different context.
This bring me to use @synchronized(dictionary) that result in a slow motion code execution! Than I removed the massive NSOperation creation using just MAXTHREAD instance. (2 or 4 instead of 300k ... is a huge difference)
Now I can parse 300k+ String in just 30/40 seconds. Impressive!! Still I have some issue (seams it parse more words than they are with just 1 thread and it parse not all the words if threads are more than 1 ... I need to figure it out) but now the code is really efficient. Maybe the next step could be using OpenCL and injecting it into the GPU :)
Here the new Code
- (void)insertWords:(NSArray *)words forLanguage:(NSString *)language {
NSDate *creationDate = [NSDate date];
NSPersistentStoreCoordinator *coordinator = [(PRDGAppDelegate*)[[NSApplication sharedApplication] delegate] persistentStoreCoordinator];
// The number of words to be parsed by the single thread.
NSUInteger wordsPerThread = (NSUInteger)ceil((double)words.count / (double)MAXTHREADS);
NSLog(@"Start Adding Operations");
// Here I minimized the number of threads. Every thread will parse and convert a finite number of words instead of 1 word per thread.
for (NSUInteger threadIdx = 0; threadIdx < MAXTHREADS; threadIdx++) {
// The NSBlockOperation.
void(^threadBlock)(void) = ^(void) {
// A new Context for the current thread.
NSManagedObjectContext *context = [[NSManagedObjectContext alloc] init];
[context setPersistentStoreCoordinator:coordinator];
// Dictionary now is in accordance with the thread context.
Dictionary *dictionary = [PRDGMainController dictionaryForLanguage:language usingContext:context];
// Stat Variable. Needed to update the UI.
NSTimeInterval beginInterval = [[NSDate date] timeIntervalSince1970];
NSUInteger operationPerInterval = 0;
// The NSOperation Core. It create a CoreDataWord.
for (NSUInteger wordIdx = 0; wordIdx < wordsPerThread && wordsPerThread * threadIdx + wordIdx < words.count; wordIdx++) {
// The String to convert
NSString *aWord = [words objectAtIndex:wordsPerThread * threadIdx + wordIdx];
// Some Exceptions to skip certain words.
if (...) {
continue;
}
// CoreData Conversion.
Word *toSaveWord = [NSEntityDescription insertNewObjectForEntityForName:@"Word" inManagedObjectContext:context];
[toSaveWord setCreated:creationDate];
[toSaveWord setText:aWord];
[toSaveWord addDictionariesObject:dictionary];
operationPerInterval++;
NSTimeInterval endInterval = [[NSDate date] timeIntervalSince1970];
// Update case.
if (endInterval - beginInterval > UPDATE_INTERVAL) {
NSLog(@"Thread %lu Processed %lu words", threadIdx, wordIdx);
// UI Update. It will be updated only by the first queue.
if (threadIdx == 0) {
// UI Update code.
}
beginInterval = endInterval;
operationPerInterval = 0;
}
}
// When the NSOperation goes to finish the CoreData thread context is saved.
[context save:nil];
NSLog(@"Operation %lu finished", threadIdx);
};
// Add the NSBlockOperation to queue.
[_operationsQueue addOperationWithBlock:threadBlock];
}
NSLog(@"Operations Added");
}
Upvotes: 3
Views: 345
Reputation: 52565
A few thoughts:
NSManagedObjectContext
for all your processes. This is Not GoodIn short, threading is hard, even when you use something like GCD.
Upvotes: 2
Reputation: 39296
It's hard to way without measuring and profiling but what looks suspicious to me is your saving the full dictionary of words that have been saved so far with the save of each word. So the amount of data per save gets successively larger and larger.
// the dictionary at this point contains all words saved so far
// which each contains a full dictionary
[toSaveWord addDictionariesObject:dictionary];
// add each time so it gets bigger each time
[dictionary addWordsObject:toSaveWord];
So, each save is saving more and more data. Why save a dictionary of all words with each word?
Some other thoughts:
Things to try:
Upvotes: 0