user3534708
user3534708

Reputation: 11

Printing the most frequent words in a file(string) Objective-C

New to objective-c, need help to solve this:

Write a function that takes two parameters:

What I tried so far (work in progress): ` // Function work in progress

// -(NSString *) wordFrequency:(int)itemsToReturn  inDocument:(NSString *)textDocument ;
//  Get the desktop directory (where the text document is)

NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];

 //  Create full path to the file
 NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:@"document.txt"];

 //  Load the string
 NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
 //  Optional code for confirmation - Check that the file is here and print its content to the console
 //  NSLog(@" The string is:%@", content);

 // Create an array with the words contain in the string
  NSArray *myWords = [content componentsSeparatedByString:@" "];

 //  Optional code for confirmation - Print content of the array to the console
 //  NSLog(@"array: %@", myWords);
 //  Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.

  NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
  NSMutableArray *dictArray = [NSMutableArray array];
  [countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
  [dictArray addObject:@{@"word": obj,
                               @"count": @([countedSet countForObject:obj])}];
    }];

  NSLog(@"Words sorted by count: %@", [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]]);
 }
return 0;
 }

Upvotes: 1

Views: 919

Answers (1)

amit
amit

Reputation: 178491

This is a classic job for map-reduce. I am very familiar with objective-c, but as far as I know - these concepts are very easily implemented in it.

1st map-reduce is counting the number of occurances.
This step is basically grouping elements according to the word, and then counting them.

map(text):
   for each word in text:
       emit(word,'1')
reduce(word,list<number>):
    emit (word,sum(number))

An alternative for using map-reduce is to use iterative calculation and a hash-map which will be a histogram that counts number of occurances per word.

After you have a a list of numbers and occurances, all you got to do is actually get top k out of them. This is nicely explained in this thread: Store the largest 5000 numbers from a stream of numbers.
In here, the 'comparator' is #occurances of each word, as calculated in previous step.

The basic idea is to use a min-heap, and store k first elements in it.
Now, iterate the remaining of the elements, and if the new one is bigger than the top (minimal element in the heap), remove the top and replace it with the new element.

At the end, you have a heap containing k largest elements, and they are already in a heap - so they are already sorted (though in reversed order, but dealing with it is fairly easy).

Complexity is O(nlogK)

To achieve O(n + klogk) you may use selection algorithm instead of the min-heap solution to get top-k, and then sort the retrieved elements.

Upvotes: 1

Related Questions