mbm29414
mbm29414

Reputation: 11598

Tokenize an NSString for filtering data (search)

I'm trying to implement search filtering for a data source that is used to populate a UITableView.

Basically, I'm trying to allow people to put in multiple words and split that one string into tokens and then iterate through each object in the datasource to see if I can find all of the search tokens anywhere in an object's properties or sub-properties.

If the user simply types in multiple words separated by spaces, this is a trivial case of using -componentsSeparatedByString:.

I'm trying, however, to also solve the case where a user might put in a comma-separated list of items.

So, the easy entry to tokenize is this:

"word1 word2 word3"

I want to also be able to tokenize this:

"word1, word2, word3"

The problem I see is that, because I don't assume that the user will enter commas, I can't simply replace/remove white space.

I see some kludgy ways to implement what I want, which basically consists of splitting first on white space, then iterating that array, splitting on commas, then iterating the overall array, removing "empty" tokens. I think this would work, but I was hoping for a more elegant way to do this, especially since I might decide to add a third delimiter at some point, which would make this solution exponentially more complex.

So far, I'm intrigued by using NSCharacterSet in combination with -componentsSeparatedByCharactersInSet. I'm having trouble with this method, though.

Here's what I'm trying so far:

NSMutableCharacterSet *delimiters = [NSMutableCharacterSet characterSetWithCharactersInString:@","];
[delimiters addCharactersInString:@" "];
NSArray *tokens = [searchText componentsSeparatedByCharactersInSet:delimiters];

The problem I'm encountering is this:

Suppose searchText (above) is "word,". In that case, my tokens array becomes this:

[@"word", @""]

So, even trying this out, it would appear (at first glance) that I would still have to iterate the tokens array to remove empty items. Again, this is possible, but I have a feeling there's a better way.

Is there a better way? Am I misusing NSCharacterSet?

Upvotes: 1

Views: 300

Answers (1)

rdelmar
rdelmar

Reputation: 104082

Use enumerateSubstringsInRange:options:usingBlock:, and pass NSStringEnumerationByWords as the option. This will separate the string into individual words, and strip out any spaces, commas, semicolons, etc. For instance, this code,

- (void)viewDidLoad {
    [super viewDidLoad];
    NSMutableArray *words = [NSMutableArray new];
    NSString *text = @"these are  , some, words with commas; semi colons: colons and period.";
    [text enumerateSubstringsInRange:NSMakeRange(0, text.length) options:NSStringEnumerationByWords  usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
        [words addObject:substring];
    }];

    NSLog(@"%@", words);
}

gives this output,

2014-10-22 11:13:25.728 GettingWordsFromStringProblem[859:270592] (
    these,
    are,
    some,
    words,
    with,
    commas,
    semicolons,
    colons,
    and,
    period
)

Upvotes: 2

Related Questions