Reputation: 11
New to objective-c, need help to solve this:
Write a function that takes two parameters:
1 a String representing a text document and
2 an integer providing the number of items to return. Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Use your best judgement to decide how words are separated. Your solution should run in O(n) time where n is the number of characters in the document. Implement this function as you would for a production/commercial system. You may use any standard data structures.
What I tried so far (work in progress): ` // Function work in progress
// -(NSString *) wordFrequency:(int)itemsToReturn inDocument:(NSString *)textDocument ;
// Get the desktop directory (where the text document is)
NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];
// Create full path to the file
NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:@"document.txt"];
// Load the string
NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
// Optional code for confirmation - Check that the file is here and print its content to the console
// NSLog(@" The string is:%@", content);
// Create an array with the words contain in the string
NSArray *myWords = [content componentsSeparatedByString:@" "];
// Optional code for confirmation - Print content of the array to the console
// NSLog(@"array: %@", myWords);
// Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:@{@"word": obj,
@"count": @([countedSet countForObject:obj])}];
}];
NSLog(@"Words sorted by count: %@", [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]]);
}
return 0;
}
Upvotes: 1
Views: 919
Reputation: 178491
This is a classic job for map-reduce. I am very familiar with objective-c, but as far as I know - these concepts are very easily implemented in it.
1st map-reduce is counting the number of occurances.
This step is basically grouping elements according to the word, and then counting them.
map(text):
for each word in text:
emit(word,'1')
reduce(word,list<number>):
emit (word,sum(number))
An alternative for using map-reduce is to use iterative calculation and a hash-map which will be a histogram that counts number of occurances per word.
After you have a a list of numbers and occurances, all you got to do is actually get top k out of them. This is nicely explained in this thread: Store the largest 5000 numbers from a stream of numbers.
In here, the 'comparator' is #occurances of each word, as calculated in previous step.
The basic idea is to use a min-heap, and store k
first elements in it.
Now, iterate the remaining of the elements, and if the new one is bigger than the top (minimal element in the heap), remove the top and replace it with the new element.
At the end, you have a heap containing k
largest elements, and they are already in a heap - so they are already sorted (though in reversed order, but dealing with it is fairly easy).
Complexity is O(nlogK)
To achieve O(n + klogk)
you may use selection algorithm instead of the min-heap solution to get top-k, and then sort the retrieved elements.
Upvotes: 1