Reputation: 12087
I have an NSString
with a number of sentences, and I'd like to split it into an NSArray
of sentences. Has anybody solved this problem before? I found enumerateSubstringsInRange:options:usingBlock:
which is able to do it, but it looks like it isn't available on the iPhone (Snow Leopard only). I thought about splitting the string based on periods, but that doesn't seem very robust.
So far my best option seems to be to use RegexKitLite to regex it into an array of sentences. Solutions?
Upvotes: 4
Views: 2566
Reputation: 6782
NSArray *sentences = [astring componentsSeparatedByCharactersInSet:[NSCharacterSet punctuationCharacterSet] ];
Upvotes: 0
Reputation: 131
How about:
NSArray *sentences = [string componentsSeparatedByString:@". "];
This will return an array("One","Two","Three") from a string "One. Two. Three."
Upvotes: 0
Reputation: 6991
I would use a scanner for it,
NSScanner *sherLock = [NSCanner scannerWithString:yourString]; // autoreleased
NSMutableArray *theArray = [NSMutableArray array]; // autoreleased
while( ![sherLock isAtEnd] ){
NSString *sentence = @"";
// . + a space, your sentences probably will have that, and you
// could try scanning for a newline \n but iam not sure your sentences
// are seperated by it
[sherLock scanUpToString:@". " inToString:&sentence];
[theArray addObject:sentence];
}
This should do it, there could be some little mistakes in it but this is how I would do it. You should lookup NSScanner in the docs though.. you might come across a method that is better for this situation.
Upvotes: 3
Reputation: 64428
I haven't used them for a while but I think you can do this with NSString, NSCharacterSet and NSScanner. You create a character set that holds end sentence punctuation and then call -[NSScanner scanUpToCharactersFromSet:intoString:]
. Each Scan will suck out a sentence into a string and you keep calling the method until the scanner runs out of string.
Of course, the text has to be well punctuated.
Upvotes: 1
Reputation: 96323
Use CFStringTokenizer. You'll want to create the tokenizer with the kCFStringTokenizerUnitSentence
option.
Upvotes: 9