Mr Jones
Mr Jones

Reputation: 1198

Creating string array using Regex.Split

Alright, I'm warning you in advance, my understanding of Regular Expressions is extremely limited (I've tried my best to learn them over the years, but to be honest, I think they just frighten me.)

Let's say I have the following string:

string keyValues = "CustomerId=1||OrderId=12||UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f"

This string represents key-value pairs, delimited by a user-defined string (in this case ||) (e.g. key1=value1||key2=value2). I am trying to extract the keys out of this string and store them in an array. That array would look like this:

{"CustomerId", "OrderId", "UserId"}

The best option I can think of is to use regular expressions (If someone has a better solution, please share). Here's what I'm trying to do:

string delimiter = "||";
string[] keys = Regex.Split(keyValues, "=.*" + delimiter);

I may be wrong, but the way I understand it, that regular expression is supposed to find a string that starts with = and ends with delimiter, with any number of any characters in between. Which would split the string at those positions, leaving me with the original keys, but instead, my keys array looks like this:

{"", "C", "u", "s", "t", "o", "m", "e", "r", "I", "d", "", "", ...}

As you can see, the =value|| part is stripped away. Can anyone tell me what I'm doing wrong?

EDIT

In my case, the delimiter || is a variable. I didn't mention this only because I thought I would be able to replace any references to || with delimiter. From the majority of the answers given, I now see that that is an important detail.

Upvotes: 0

Views: 1111

Answers (4)

falsetru
falsetru

Reputation: 368894

| has special meaning in regular expression (patA|patB matches either patA or patB). Escape |.

Using non-greedy match (.*?):

string delimiter = "||";
string[] keys = Regex.Split(keyValues, @"=.*?" + Regex.Escape(delimiter));

This will give you {"CustomerId", "OrderId", "UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f"}.

Matches with lookahead assertion is more appropriate:

string delimiter = "||";
string keyValues = "CustomerId=1||OrderId=12||UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f";
string pattern = @"(?<=^|" + Regex.Escape(delimiter) + @")\w+(?==)";
var keys = Regex.Matches(keyValues, pattern);

BTW, use verbatim string literals (@"verbatim string literal") when express regular expression.

Demo

Upvotes: 3

Guffa
Guffa

Reputation: 700152

An alternative is to do this without a regular expression, as the string operations are pretty basic:

string[] keys =
  keyValues.Split(new string[]{"||"}, StringSplitOptions.None)
  .Select(s => s.Substring(0, s.IndexOf('='))).ToArray();

Keep the regular expressions to the advanced string operations. :)

(When testing the performance of this solution compared to using a regular expression, this showed to be about 40 times faster.)

Upvotes: 1

user557597
user557597

Reputation:

Split on @"=[^|]*(?:\|\||$)"
If you need more assurance, use @"=[^=|]*(?:\|\||$)"

Edited to consume end where no delimeter exists.
Try to just use no-blank elements if its in C#.

Upvotes: 0

Ibrahim Najjar
Ibrahim Najjar

Reputation: 19423

If you just care for the keys, why not try to use a match instead of a split using:

@"[^=|]+(?==)"

If the key can't contain an equal sign = or a vertical bar |, then the above expression will match one ore more characters that are not = or | which are followed by an equal sign =, thus matching the keys.

In C#:

var input = "CustomerId=1||OrderId=12||UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f";
var results = Regex.Matches(input, @"[^=|]+(?==)");

Upvotes: 2

Related Questions