Developer
Developer

Reputation: 6465

How to use regular expression in iPhone app to separate string by , (comma)

I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,\"Cry the Beloved Country Final Essay\",[email protected]. I want to store the values of three columns in an Array, so I used componentSeparatedByString:@"," method! It is successfully returning me the array with three components:

  1. Christopher Bass
  2. Cry the Beloved Country Final Essay
  3. [email protected]

but when there is already a comma in the column value, like this Christopher Bass,\"Cry, the Beloved Country Final Essay\",[email protected] it separates the string in four components because there is a ,(comma) after the Cry:

  1. Christopher Bass
  2. Cry
  3. the Beloved Country Final Essay
  4. [email protected]

so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!

Thanks-

Upvotes: 7

Views: 2138

Answers (5)

Martin Gjaldbaek
Martin Gjaldbaek

Reputation: 3015

Is the title guarantied to have the quotation marks? And is it the only component that can have them? Because then componentSeparatedByString:@"\"" should get you this:

  1. Christopher Bass,
  2. Cry, the Beloved Country Final Essay
  3. ,[email protected]

Then use componentSeparatedByString:@"," or substringFrom/ToIndex: to get rid of the two commas in the first and last component.

Here's a solution using substring:

NSString* input = @"Christopher Bass,\"Cry, the Beloved Country Final Essay\",[email protected]";
NSArray* split = [input componentsSeparatedByString:@"\""];
NSString* part1 = [split objectAtIndex:0];
NSString* part2 = [split objectAtIndex:1];
NSString* part3 = [split objectAtIndex:2];
part1 = [part1 substringToIndex:[part1 length] - 1];
part3 = [part3 substringFromIndex:1];

NSLog(part1);
NSLog(part2);
NSLog(part3);

Upvotes: 0

Feysal
Feysal

Reputation: 623

How about this:

componentsSeparatedByRegex:@",\\\"|\\\","

This should split your string whereever " and , appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters " and , never appear consecutively within the three components.

If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain " and , anywhere, not even a limited solution is possible in such cases:

Doe, John,\"\"Why Unescaped Strings Suck\", And Other Development Horror Stories\",Doe, John <[email protected]>

Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.

Upvotes: 1

The last part looks like it will never contain a comma. Neither will the first one as far as I can see...

What about splitting the string like this:

NSArray *splitArr = [str componentsSeparatedByString:@","];
NSString *nameStr = [splitArr objectAtIndex:0];
NSString *emailStr = [splitArr lastObject];

NSString *contentStr = @"";
for(int i=1; i<[splitArr count]-1; ++i) {
    contentStr = [contentStr stringByAppendingString:[splitArr objectAtIndex:i]];
}

This will use the first and last string as is, and combine the rest into the content.

Kind of a hack, but a name and an email address will never contain a comma, right?

Upvotes: 0

user207616
user207616

Reputation:

The regex you're searching for is: \\"(.*)\\"[ ^,]*|([^,]*),

in ObjC: (('\"' && string_1 && '\"' && 0-n spaces) || string_2 except comma) && comma

NSString *str = @"Christopher Bass,\"Cry, the Beloved Country ,Final Essay\",[email protected],som";
NSString *regEx = @"\\\"(.*)\\\"[ ^,]*|([^,]*),";
NSMutableArray *split = [[str componentsSeparatedByRegex:regEx] mutableCopy];
[split removeObject:@""]; // because it will print always both groups even if the other is empty
NSLog(@"%@", split);

// OUTPUT:
2012-02-07 17:42:18.778 tmpapp[92170:c03] (
    "Christopher Bass",
    "Cry, the Beloved Country ,Final Essay",
    "[email protected]",
    som
)

RegexKitLite will add both strings to the array, therefore you will end up with empty objects for your array. removeObject:@"" will delete those but if you need to maintain true empty values (eg. your source has val,,ue) you have to modify the code to the following:

str = [str stringByReplacingOccurrencesOfRegex:regEx withString:@"$1$2∏"];
NSArray *split = [str componentsSeparatedByString:@"∏"];

$1 and $2 are those two strings mentioned above, ∏ is in this case a character which will most likely never appear in normal text (and is easy to remember: option-shift-p).

Upvotes: 0

El Developer
El Developer

Reputation: 3346

Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way: "My string". Otherwise you will have the same problem. Good luck.

For your example you would probably need to do something like:

\"Christopher Bass\",\"Cry\, the Beloved Country Final Essay\",\"[email protected]\"

That way you could use a regexp or even the same method from the NSString class.

Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.

Upvotes: 2

Related Questions