Reputation: 73
Im trying to split a string in C#. The string looks like this:
string line = "red,\"\",blue,\"green\",\"blue,orange\",,\"black\",yellow";
The result should be:
string[] result = { "red", "", "blue", "green", "blue,orange", "", "black", "yellow" };
Note that the delimiter is "," but inside double quotes it is ignored. Also note that not every substring between the delimiter is surrounded by quotes. I would like an answer where the delimiter is a string if possible. I don't mind if the double quotes are included inside the elements of the result array, like:
string[] result = { "red", "\"\"", "blue", "\"green\"", "\"blue,orange\"", "", "\"black\"", "yellow" };
Upvotes: 1
Views: 944
Reputation: 155290
This is a 2-state machine that reads each character in the string, when it encounters a double-quote it will enter a state where it will treat every subsequent character as part of the value
until it encounters another double-quote. When it's in the normal state it will form a string from each character encountered until it encounters a comma and adds it to a list of strings to return:
enum State {
InQuotes,
InValue
}
List<String> result = new List<String>();
using(TextReader rdr = new StringReader( line )) {
State state = State.InValue;
StringBuilder sb = new StringBuilder();
Int32 nc; Char c;
while( (nc = rdr.Read()) != -1 ) {
c = (Char)nc;
switch( state ) {
case State.InValue:
if( c == '"' ) {
state = State.InQuotes;
} else if( c == ',' ) {
result.Add( sb.ToString() );
sb.Length = 0;
} else {
sb.Append( c );
}
break;
case State.InQuotes:
if( c == '"' ) {
state = State.InValue;
} else {
sb.Append( c );
}
break;
} // switch
} // while
if( sb.Length > 0 ) result.Add( sb.ToString() );
} // using
Upvotes: 3