Redg
Redg

Reputation: 409

Split String using delimiter that exists in the string

I have a problem and I am wondering if there is any smart workaround.

I need to pass a string through a socket to a web application. This string has three parts and I use the '|' as a delimiter to split at the receiving application into the three separate parts.

The problem is that the '|' character can be a character in any of the 3 separate strings and when this occurs the whole splitting action distorts the strings.

My question therefore is this: Is there a way to use a char/string as a delimiter in some text while this char/string itself might be in the text?

Upvotes: 1

Views: 1978

Answers (7)

Filip Ekberg
Filip Ekberg

Reputation: 36297

Is there a way to use a char/string as a delimiter in some text while this char/string itself might be in the text?

Simple answer: No.

This is of course when the string/delimiter is exactly the same, without doing modifications to the text.

There are of course possible workarounds. One possible solution is that you might want to have a minimum/fixed width between delimiters, this is not perfect however.

Another possible solution is to select a delimiter (sequence of characters) that will never occur together in your text. This requires you to change the source and consumer.

When I need to use delimiters I normally select a delimiter that I am 99.9% sure will never occur in normal text, the delimiter may vary depending on what kind of text that I expect.

Here's a quote from Wikipedia:

Because delimiter collision is a very common problem, various methods for avoiding it have been invented. Some authors may attempt to avoid the problem by choosing a delimiter character (or sequence of characters) that is not likely to appear in the data stream itself. This ad-hoc approach may be suitable, but it necessarily depends on a correct guess of what will appear in the data stream, and offers no security against malicious collisions. Other, more formal conventions are therefore applied as well.

Just a side note to your use-case, why not use a protocol for the data that is sent? Such as protobuf?

Upvotes: 1

flesk
flesk

Reputation: 7579

Instead of using | as delimiter, you could find a delimiter that's not present in the message parts and pass it along at the beginning of the sent message. Here's an example using an integer as delimiter:

String[] parts = {"this is a message", "it's got three parts", "this one's the last"};
String delimiter = null;

for (int i = 0; i < 100; i++) {
    String s = Integer.toString(i);
    if (parts[0].contains(s) || parts[1].contains(s) || parts[2].contains(s))
        continue;
    delimiter = s;
    break;
}

String message = delimiter + "#" + parts[0] + delimiter + parts[1] + delimiter + parts[2];

Now the message is 0#this is a message0it's got three parts0this one's the last.

On the receiving end you start by finding the delimiter and split the message string on that:

String[] tmp = message.split("#", 2);
String[] parts = tmp[1].split(tmp[0]);

It's not the most efficient possible solution, since it requires scanning the message parts several times, but it's very easy to implement. If you don't find a value for delimiter and null happens to be part of the message, you might experience unexpected results.

Upvotes: 0

dureuill
dureuill

Reputation: 2576

The matter here is that given the following string:

string toParse = "What|do you|want|to|say|?";

It can be parsed in many several ways:

"What
do you
want|to|say|?"

or

"What|do you
want
to|say|?"

and so on...

You can define rules to parse your string, but coding it will be hard, and it will seem counter intuitive to the final user.

The string must contains an escape character that indicates that the symbol "|" is wanted, not the separator. This could be for example "\|".

Here a full example using regex:

using System.Text.RegularExpressions;

//... Put this in the main method of a Console Application for instance.
// The '@' character before the strings are to specify "raw" strings, where escape characters '\' are not escaped
Regex reg = new Regex(@"^((?<string1>([^\|]|\\\|)+)\|)((?<string2>([^\|]|\\\|)+)\|)(?<string3>([^\|]|\\\|)+)$");
string toTest = @"user\|dureuill|deserves|an\|upvote";
MatchCollection matches = reg.Matches(toTest);
if (matches.Count != 1)
{
    throw new FormatException("Bad formatted pattern.");
}

Match match = matches[0];
string string1 = match.Groups["string1"].Value.Replace(@"\|", "|");
string string2 = match.Groups["string2"].Value.Replace(@"\|", "|");
string string3 = match.Groups["string3"].Value.Replace(@"\|", "|");
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.ReadKey();

Upvotes: 1

Andreas Fester
Andreas Fester

Reputation: 36650

The general pattern is to escape the delimiter character. E.g. when '|' is the delimiter, you could use "||" whenever you need the character itself inside a string (might be difficult if you allow empty strings) or you could use something like '\' as the escape character so that '|' becomes "\|" and "\" itself would be "\\"

Upvotes: 2

user1017882
user1017882

Reputation:

Maybe adapt the delimeter if you have the flexibility to do this? So instead of String1|String2 the string could read "String1"|"String2".

If pipes are unwanted - put some simple validation in place during creation/entry of this string?

Upvotes: 0

Justin Harvey
Justin Harvey

Reputation: 14682

I think you either

1)Find a character or set of characters together that would never appear in the string

or

2)Use fixed length strings and pad.

Upvotes: 0

TimVK
TimVK

Reputation: 1146

Maybe it is useful to HTMLEncode and HTMLDecode your strings first and then attach them together with your delimiter.

Upvotes: 0

Related Questions