Reputation: 409
I have a problem and I am wondering if there is any smart workaround.
I need to pass a string through a socket to a web application. This string has three parts and I use the '|' as a delimiter to split at the receiving application into the three separate parts.
The problem is that the '|' character can be a character in any of the 3 separate strings and when this occurs the whole splitting action distorts the strings.
My question therefore is this: Is there a way to use a char/string as a delimiter in some text while this char/string itself might be in the text?
Upvotes: 1
Views: 1978
Reputation: 36297
Is there a way to use a char/string as a delimiter in some text while this char/string itself might be in the text?
Simple answer: No.
This is of course when the string/delimiter is exactly the same, without doing modifications to the text.
There are of course possible workarounds. One possible solution is that you might want to have a minimum/fixed width between delimiters, this is not perfect however.
Another possible solution is to select a delimiter (sequence of characters) that will never occur together in your text. This requires you to change the source and consumer.
When I need to use delimiters I normally select a delimiter that I am 99.9% sure will never occur in normal text, the delimiter may vary depending on what kind of text that I expect.
Here's a quote from Wikipedia:
Because delimiter collision is a very common problem, various methods for avoiding it have been invented. Some authors may attempt to avoid the problem by choosing a delimiter character (or sequence of characters) that is not likely to appear in the data stream itself. This ad-hoc approach may be suitable, but it necessarily depends on a correct guess of what will appear in the data stream, and offers no security against malicious collisions. Other, more formal conventions are therefore applied as well.
Just a side note to your use-case, why not use a protocol for the data that is sent? Such as protobuf?
Upvotes: 1
Reputation: 7579
Instead of using |
as delimiter, you could find a delimiter that's not present in the message parts and pass it along at the beginning of the sent message. Here's an example using an integer as delimiter:
String[] parts = {"this is a message", "it's got three parts", "this one's the last"};
String delimiter = null;
for (int i = 0; i < 100; i++) {
String s = Integer.toString(i);
if (parts[0].contains(s) || parts[1].contains(s) || parts[2].contains(s))
continue;
delimiter = s;
break;
}
String message = delimiter + "#" + parts[0] + delimiter + parts[1] + delimiter + parts[2];
Now the message is 0#this is a message0it's got three parts0this one's the last
.
On the receiving end you start by finding the delimiter and split the message string on that:
String[] tmp = message.split("#", 2);
String[] parts = tmp[1].split(tmp[0]);
It's not the most efficient possible solution, since it requires scanning the message parts several times, but it's very easy to implement. If you don't find a value for delimiter
and null
happens to be part of the message, you might experience unexpected results.
Upvotes: 0
Reputation: 2576
The matter here is that given the following string:
string toParse = "What|do you|want|to|say|?";
It can be parsed in many several ways:
"What
do you
want|to|say|?"
or
"What|do you
want
to|say|?"
and so on...
You can define rules to parse your string, but coding it will be hard, and it will seem counter intuitive to the final user.
The string must contains an escape character that indicates that the symbol "|" is wanted, not the separator. This could be for example "\|".
Here a full example using regex:
using System.Text.RegularExpressions;
//... Put this in the main method of a Console Application for instance.
// The '@' character before the strings are to specify "raw" strings, where escape characters '\' are not escaped
Regex reg = new Regex(@"^((?<string1>([^\|]|\\\|)+)\|)((?<string2>([^\|]|\\\|)+)\|)(?<string3>([^\|]|\\\|)+)$");
string toTest = @"user\|dureuill|deserves|an\|upvote";
MatchCollection matches = reg.Matches(toTest);
if (matches.Count != 1)
{
throw new FormatException("Bad formatted pattern.");
}
Match match = matches[0];
string string1 = match.Groups["string1"].Value.Replace(@"\|", "|");
string string2 = match.Groups["string2"].Value.Replace(@"\|", "|");
string string3 = match.Groups["string3"].Value.Replace(@"\|", "|");
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.ReadKey();
Upvotes: 1
Reputation: 36650
The general pattern is to escape the delimiter character. E.g. when '|' is the delimiter, you could use "||" whenever you need the character itself inside a string (might be difficult if you allow empty strings) or you could use something like '\' as the escape character so that '|' becomes "\|" and "\" itself would be "\\"
Upvotes: 2
Reputation:
Maybe adapt the delimeter if you have the flexibility to do this? So instead of String1|String2 the string could read "String1"|"String2".
If pipes are unwanted - put some simple validation in place during creation/entry of this string?
Upvotes: 0
Reputation: 14682
I think you either
1)Find a character or set of characters together that would never appear in the string
or
2)Use fixed length strings and pad.
Upvotes: 0
Reputation: 1146
Maybe it is useful to HTMLEncode and HTMLDecode your strings first and then attach them together with your delimiter.
Upvotes: 0