Reputation: 9596
In short:
Given the followoing string:
Input string -> "hello, world" , oh my, parapappa12
I want to extract these three "tokens":
Output tokens ->
Tokenizing string in ios
I got a file containing some data. It looks something like:
word , word, word
word , word, word
word , word, word
where some words can contain a "," but only when the word starts and end with a certain character, eg. starts with " and ends with "
Example of words:
word : blebla bla bla
word : "bla bla bla, bla"
How do I define a regular expression to tokenize the file based on the "," ingoring white spaces between the words and including this "special" case?
I remember using regex in Perl to achieve something similar but was long time ago and I kind of forgot the syntax and I am not sure if this is supported in Objective-C and iOS
Upvotes: 1
Views: 430
Reputation: 7534
Without knowing the context of why you need to parse strings like this I can't give you a great answer, but I here are some ideas that might be better than RegEx if you find yourself needing to parse something more complicated or if you would just like to learn more about state machines and grammars.
NSScanner
(the code from that link isn't great so ignore it, but the concept is illustrated)You seem content with RegEx, but maybe this will help future visitors.
Upvotes: 0
Reputation: 22988
First, a Perl oneliner (here fullscreen):
# echo -n '"hello, world" , oh my, parapappa12' | perl -ne 'print "<$1>\n" while /("[^"]*"|[^, ]+)/g'
<"hello, world">
<oh>
<my>
<parapappa12>
And here the Objective C method:
NSString* const str = @"\"hello, world\" , oh my, parapappa12";
[self splitCommas:str];
- (void)splitCommas:(NSString*)str
{
NSString* const pattern = @"(\"[^\"]*\"|[^, ]+)";
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:pattern
options:0
error:nil];
NSRange searchRange = NSMakeRange(0, [str length]);
NSArray *matches = [regex matchesInString:str
options:0
range:searchRange];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSLog(@"%@", [str substringWithRange:matchRange]);
}
}
Explanation for the regex:
"[^"]*"
(anything but quote)[^, ]+
(anything but comma or space)(the square brackets define the "character class" and the caret negates it).
Note: My solution doesn't handle escaped quotes like in "I say \"Hello\""
Upvotes: 1