bertday
bertday

Reputation: 10971

Check strings for same characters in Objective-C

I have an array of strings, from which I would like to extract only those with unique character sets. (For example, "asdf" and "fdsa" would be considered redundant). This is the method I am currently using:

NSMutableArray *uniqueCharSets = [[NSMutableArray alloc] init];
NSMutableArray *uniqueStrings = [[NSMutableArray alloc] init];        

for (NSString *_string in unique) {
    NSCharacterSet *_charSet = [NSCharacterSet characterSetWithCharactersInString:_string];
    if (![uniqueCharSets containsObject:_charSet]) {
        [uniqueStrings addobject:_string];
        [uniqueCharSets addObject:_charSet];
    }
}

This seems to work, but it's very slow and resource-intensive. Can anyone think of a better way to do this?

Upvotes: 6

Views: 1853

Answers (3)

Alex Reynolds
Alex Reynolds

Reputation: 96937

  1. Using an NSDictionary, map each string's lexicographically-sorted equivalent to an NSArray of input strings: (e.g. adfs => [afsd, asdf, ...])
  2. Walk through the dictionary, printing out keys (or their values) which only have single-element array values

Upvotes: 1

Manlio
Manlio

Reputation: 10865

The only thing that come in my mind is not to use containsObject: since NSMutableArray is not ordered (in general), we can assume that containsObject simply iterates the array starting from the beginning until he finds the object. This means O(n) (n comparisons in the worst case).

A better solution may consists in keeping the array ordered and use a custom search method using a dichotomic approach. This way you'll have a O(log n) complexity.
Of course, you must take care of keeping your array ordered (much more efficient than add and reorder), so you should use insertObject:atIndex: method to insert the element properly.

Upvotes: 0

Joshua Weinberg
Joshua Weinberg

Reputation: 28688

I just put together a quick example of how I would approach this, but it turns out that it is more, odd, than you first expect. For one, NSCharacterSet doesn't implement equality to check contents. It only uses the pointer value. Based on this your example will NOT work properly.

My approach is to use an NSSet to deal with the hashing of these for us.

@interface StringWrapper : NSObject
@property (nonatomic, copy) NSString *string;
@property (nonatomic, copy) NSData *charSetBitmap;
- (id)initWithString:(NSString*)aString;
@end

@implementation StringWrapper
@synthesize string, charSetBitmap;

- (id)initWithString:(NSString*)aString;
{
    if ((self = [super init]))
    {
        self.string = aString;
    }
    return self;
}

- (void)setString:(NSString *)aString;
{
    string = [aString copy];
    self.charSetBitmap = [[NSCharacterSet characterSetWithCharactersInString:aString] bitmapRepresentation];
}

- (BOOL)isEqual:(id)object;
{
    return [self.charSetBitmap isEqual:[object charSetBitmap]];
}

- (NSUInteger)hash;
{
    return [self.charSetBitmap hash];
}

@end

int main (int argc, const char * argv[])
{
    @autoreleasepool {
        NSMutableSet *stringWrappers = [[NSMutableSet alloc] init];
        NSArray *strings = [NSArray arrayWithObjects:@"abc",@"aaabcccc",@"awea",@"awer",@"abcde", @"ehra", @"QWEQ", @"werawe", nil];
        for (NSString *str in strings)
            [stringWrappers addObject:[[StringWrapper alloc] initWithString:str]];

        NSArray *uniqueStrings = [stringWrappers valueForKey:@"string"];
        NSLog(@"%@", uniqueStrings);

    }
    return 0;
}

The code is pretty straightforward. We create a container object to cache the results of the character set's bitmap representation. We use the bitmap representation because NSData implements isEqual: appropriately.

Upvotes: 0

Related Questions