Reputation: 13350
I'm parsing command sequence strings and need to convert each string into a string[] that will contain command tokens in the order that they're read.
The reason being is that these sequences are stored in a database to instruct a protocol client to carry out a certain prescribed sequence for individual distant applications. There are special tokens in these strings that I need to add to the string[] by themselves because they don't represent data being transmitted; instead they indicate blocking pauses.
The sequences do not contain delimiters. There can be any amount of special tokens found anywhere in a command sequence which is why I can't simply parse the strings with regex. Also, all of these special commands within the sequence are wrapped with ${}
Here's an example of the data that I need to parse into tokens (P1 indicates blocking pause for one second):
"some data to transmit${P1}more data here"
Resulting array should look like this:
{ "some data to transmit", "${P1}", "more data here" }
I would think LINQ could help with this, but I'm not so sure. The only solution I can come up with would be to loop through each character until a $
is found and then detect if a special pause command is available and then parse the sequence from there using indexes.
Upvotes: 3
Views: 720
Reputation: 13350
Using a little bit of Gabe's suggestion, I've come up with a solution that does exactly what I was looking to do:
string tokenPattern = @"(\${\w{1,4}})";
string cmdSequence = "${P}test${P}${P}test${P}${Cr}";
string[] tokenized = (from token in Regex.Split(cmdSequence, tokenPattern)
where token != string.Empty
select token).ToArray();
With the command sequence in the above example, the array contains this:
{ "${P}", "test", "${P}", "${P}", "test", "${P}", "${Cr}"}
Upvotes: 0
Reputation: 86718
One option is to use Regex.Split(str, @"(\${.*?})")
and ignore the empty strings that you get when you have two special tokens next to each other.
Perhaps Regex.Split(str, @"(\${.*?})").Where(s => s != "")
is what you want.
Upvotes: 2
Reputation: 14223
Alright, so as was mentioned in the comments, I suggest you read about lexers. They have the power to do everything and more of what you described.
Since your requirements are so simple, I'll say that it is not too difficult to write the lexer by hand. Here's some pseudocode that could do it.
IEnumerable<string> tokenize(string str) {
var result = new List<string>();
int pos = -1;
int state = 0;
int temp = -1;
while( ++pos < str.Length ) {
switch(state) {
case 0:
if( str[pos] == "$" ) { state = 1; temp = pos; }
break;
case 1:
if( str[pos] == "{" ) { state = 2; } else { state = 0; }
break;
case 2:
if( str[pos] == "}" } {
state = 0;
result.Add( str.Substring(0, temp) );
result.Add( str.Substring(temp, pos) );
str = str.Substring(pos);
pos = -1;
}
break;
}
}
if( str != "" ) {
result.Add(str);
}
return result;
}
Or something like that. I usually get the parameters of Substring
wrong on the first try, but that's the general idea.
You can get a much more powerful (and easier to read) lexer by using something like ANTLR.
Upvotes: 1