Entity
Entity

Reputation: 8202

C# advanced String.Split

I have a string similar to this one:

The boy said to his mother, "Can I have some candy?"

If I do a normal String.Split on it, I get:

{ 'The', 'boy', 'said', 'to', 'his', 'mother', '"Can', 'I', 'have', 'some', 'candy?"' }

I want an array like so:

{ 'The', 'boy', 'said', 'to', 'his', 'mother', 'Can I have some candy?' }

Obviously, I could just loop through character by character and keep track of whether I'm in a string or not and all that... but is there a better way? With Regexs perhaps?

Upvotes: 8

Views: 863

Answers (2)

Michael Entin
Michael Entin

Reputation: 7724

Depends a bit on your requirements. E.g. do you need to treat AAA"BBB (no spaces) as signle word, or two words? If AAA"BBB is a single word, and " only starts a qouted field after delimiter - this looks like CSV parser. Of course, CSV has other rules, like double qoutes to mean literal quote, etc - but you would need to define some similar rules too.

So you can adapt any open source CSV parser, or see if e.g. Microsoft.VisualBasic.FileIO.TextFieldParser works for you

        string msg = "The boy said to his mother, \"Can I have some candy?\"";
        System.IO.MemoryStream s = new System.IO.MemoryStream(Encoding.Unicode.GetBytes(msg));
        TextFieldParser p = new TextFieldParser(s, Encoding.Unicode);
        p.Delimiters = new string[] { " ", "," };
        foreach(var f in p.ReadFields().Where(f => f != ""))
            Console.WriteLine(f);

Upvotes: 2

MRAB
MRAB

Reputation: 20644

How about finding all the matches of this regex:

"[^"]*"|\S+

Upvotes: 9

Related Questions