mpen
mpen

Reputation: 283355

How to split this string on commas, but only if it meets this criteria?

(This is a lot like my last question, but I just realized I was trying to solve the wrong problem)

I'm building something like jQuery, and I'm trying to parse the selectors. So, given something like this:

a[href="http://weird.url/has,commas"], strong

How can I split this into

a[href="http://weird.url/has,commas"]
strong

?

It's needs to be split on the comma, but only if it isn't in quotes, or in an attribute.


Modified version of max's solution:

    static IEnumerable<string> SplitSelectors(string str)
    {
        int openBrackets = 0;
        int lastIndex = 0;

        for (int i = 0; i < str.Length; ++i)
        {
            switch (str[i])
            {
                case '[':
                    openBrackets++;
                    break;
                case ']':
                    openBrackets--;
                    break;
                case ',':
                    if (openBrackets == 0)
                    {
                        yield return str.Substring(lastIndex, i - lastIndex);
                        lastIndex = i + 1;
                    }
                    break;
            }
        }
        yield return str.Substring(lastIndex);
    }

I'm ignoring quotes, because I don't think they should occur outside of the attribute selector anyway. I'm trying to mimic the jQuery specs, but I'm not entirely sure what they are in this scenario.

Upvotes: 2

Views: 326

Answers (2)

max
max

Reputation: 34447

    static List<string> SplitByComma(string str)
    {
        bool quoted = false;
        bool attr = false;
        int start = 0;
        var result = new List<string>();
        for(int i = 0; i < str.Length; ++i)
        {
            switch(str[i])
            {
                case '[':
                    if(!quoted) attr = true;
                    break;
                case ']':
                    if(!quoted) attr = false;
                    break;
                case '\"':
                    if(!attr) quoted = !quoted;
                    break;
                case ',':
                    if(!quoted && !attr)
                    {
                        result.Add(str.Substring(start, i - start));
                        start = i + 1;
                    }
                    break;
            }
        }
        if(start < str.Length)
            result.Add(str.Substring(start));
        return result;
    }

Upvotes: 2

Andrew Kennan
Andrew Kennan

Reputation: 14157

You need to parse the string into tokens character by character, keeping track of whether you are within quotes. Something along these lines:

for each char in text
  if char is quote
    if escaped = true
      escaped = false
    else
      escaped = true
  else if char is comma
    if escaped = true
      add char to token
    else
      begin new token
  else
    add char to token

where escaped indicates whether you're inside quotes or not.

Upvotes: 1

Related Questions