Reputation: 265
I have the following main string which contains link Name and link URL. The name and url is combined with #;
. I want to get the string of each link (name and url i.e. My web#?http://www.google.com
), see example below
string teststring = "My web#;http://www.google.com My Web2#;http://www.bing.se Handbooks#;http://www.books.se/";
and I want to get three different strings using any string function:
Upvotes: 3
Views: 417
Reputation: 6159
If you have any control on the input format, you may want to change it to be easy to parse, for example by using another separator between items, other than space.
If this format can't be changed, why not just implement the split in code? It's not as short as using a RegEx, but it might be actually easier for a reader to understand since the logic is straight forward.
This will almost definitely will be faster and cheaper in terms of memory usage.
An example for code that solves this would be:
static void Main(string[] args)
{
var testString = "My web#;http://www.google.com My Web2#;http://www.bing.se Handbooks#;http://www.books.se/";
foreach(var x in SplitAndFormatUrls(testString))
{
Console.WriteLine(x);
}
}
private static IEnumerable<string> SplitAndFormatUrls(string input)
{
var length = input.Length;
var last = 0;
var seenSeparator = false;
var previousChar = ' ';
for (var index = 0; index < length; index++)
{
var currentChar = input[index];
if ((currentChar == ' ' || index == length - 1) && seenSeparator)
{
var currentUrl = input.Substring(last, index - last);
yield return currentUrl.Replace("#;", "#?");
last = index + 1;
seenSeparator = false;
previousChar = ' ';
continue;
}
if (currentChar == ';' && previousChar == '#')
{
seenSeparator = true;
}
previousChar = currentChar;
}
}
Upvotes: 1
Reputation: 44259
So this looks like you want to split on the space after a #;
, instead of splitting at #;
itself. C# provides arbitrary length lookbehinds, which makes that quite easy. In fact, you should probably do the replacement of #;
with #?
first:
string teststring = "My web#;http://www.google.com My Web2#;http://www.bing.se Handbooks#;http://www.books.se/";
teststring = Regex.Replace(teststring, @"#;", "#?");
string[] substrings = Regex.Split(teststring, @"(?<=#\?\S*)\s+");
That's it:
foreach(var s in substrings)
Console.WriteLine(s);
Output:
My web#?http://www.google.com
My Web2#?http://www.bing.se
Handbooks#?http://www.books.se/
If you are worried that your input might already contain other #?
that you don't want to split on, you can of course do the splitting first (using #;
in the pattern) and then loop over substrings
and do the replacement call inside the loop.
Upvotes: 4
Reputation: 13965
If these are constant strings, you can just use String.Substring
. This will require you to count letters, which is a nuisance, in order to provide the right parameters, but it will work.
string string1 = teststring.Substring(0, 26).Replace(";","?");
If they aren't, things get complicated. You could almost do a split with " " as the delimiter, except that your site name has a space. Do any of the substrings in your data have constant features, such as domain endings (i.e. first .com, then .de, etc.) or something like that?
Upvotes: 1