pfinferno
pfinferno

Reputation: 1945

Using Regex to split string by different characters based on occurance

I'm currently replacing a very old (and long) C# string parsing class that I think could be condensed into a single regex statement. Being a newbie to Regex, I'm having some issues getting it working correctly.

Description of the possible input strings:

The input string can have up to three words separated by spaces. It can stop there, or it can have an = followed by more words (any amount) separated by a comma. The words can also be contained in quotes. If a word is in quotes and has a space, it should NOT be split by the space.

Examples of input and expected output elements in the string array:

Input1: this is test Output1: {"this", "is", "test"}

Input2:this is test=param1,param2,param3

Output2: {"this", "is", "test", "param1", "param2", "param3"}

Input3:use file "c:\test file.txt"=param1 , param2,param3

Output3: {"use", "file", "c:\test file.txt", "param1", "param2", "param3"}

Input4:log off

Output4: {"log", "off"}

And the most complex one:

Input5: use object "c:\test file.txt"="C:\Users\layer.shp" | ( object = 10 ),param2

Output5: {"use", "object", "c:\test file.txt", "C:\Users\layer.shp | ( object = 10 )", "param2"}

So to break this down:

Here's the closest regex I've got:

\w+|"[\w\s\:\\\.]*"+([^,]+)

This seems to split the string based on spaces, and by commas after the =. However, it seems to include the = for some reason if one of the first three words is surrounded by quotes. Also, I'm not sure how to split by space only up to the first three words in the string, and the rest by comma if there is an =.

It looks like part of my solution is to use quantifiers with {}, but I've unable to set it up properly.

Upvotes: 1

Views: 125

Answers (1)

jdweng
jdweng

Reputation: 34421

Without Regex. Regex should be used when string methods cannot be used. :

            string[] inputs = { 
                              "this is test",
                              "this is test=param1,param2,param3",
                              "use file \"c:\\test file.txt\"=param1 , param2,param3",
                              "log off",
                              "use object \"c:\\test file.txt\"=\"C:\\Users\\layer.shp\" | ( object = 10 ),param2"
                          };

            foreach (string input in inputs)
            {
                List<string> splitArray;
                if (!input.Contains("="))
                {
                    splitArray = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
                }
                else
                {
                    int equalPosition = input.IndexOf("=");
                    splitArray = input.Substring(0, equalPosition).Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
                    string end = input.Substring(equalPosition + 1);
                    splitArray.AddRange(end.Split(new char[] { ',' }).ToList());
                }
                string output = string.Join(",", splitArray.Select(x => x.Contains("\"") ? x : "\"" + x + "\""));
                Console.WriteLine(output);
            }
            Console.ReadLine();

Upvotes: 2

Related Questions