Reputation: 3914
I have a sentence that may contain URL's. I need to take any URL in uppercase that starts with WWW.
, and append HTTP://
. I have tried the following:
private string ParseUrlInText(string text)
{
string currentText = text;
foreach (string word in currentText.Split(new[] { "\r\n", "\n", " ", "</br>" }, StringSplitOptions.RemoveEmptyEntries))
{
string thing;
if (word.ToLower().StartsWith("www."))
{
if (IsAllUpper(word))
{
thing = "HTTP://" + word;
currentText = ReplaceFirst(currentText, word, thing);
}
}
}
return currentText;
}
public string ReplaceFirst(string text, string search, string replace)
{
int pos = text.IndexOf(search);
if (pos < 0)
{
return text;
}
return text.Substring(0, pos) + replace + text.Substring(pos + search.Length);
}
private static bool IsAllUpper(string input)
{
return input.All(t => !Char.IsLetter(t) || Char.IsUpper(t));
}
However its only appending multiple HTTP://
to the first URL using the following:
WWW.GOOGLE.CO.ZA
WWW.GOOGLE.CO.ZA WWW.GOOGLE.CO.ZA
HTTP:// WWW.GOOGLE.CO.ZA
there are a lot of domains (This shouldn't be parsed)
to
HTTP:// WWW.GOOGLE.CO.ZA
HTTP:// WWW.GOOGLE.CO.ZA HTTP:// WWW.GOOGLE.CO.ZA
HTTP:// WWW.GOOGLE.CO.ZA
there are a lot of domains (This shouldn't be parsed)
Please could someone show me the proper way to do this
Edit: I need to keep the format of the string (Spaces, newlines etc)
Edit2: A url might have an HTTP://
appended. I've updated the demo.
Upvotes: 0
Views: 544
Reputation: 3188
The issue with your code: you're using a ReplaceFirst method, which does exactly what it's meant to: it replaces the first occurence, which is obviously not always the one you want to replace. This is why only your first WWW.GOOGLE.CO.ZA get all the appending of HTTP://.
One method would be to use a StreamReader or something, and each time you get to a new word, you check if it's four first characters are "WWW." and insert at this position of the reader the string "HTTP://". But it's pretty heavy lenghted for something that can be way shorter...
So let's go Regex!
How to insert characters before a word with Regex
Regex.Replace(input, @"[abc]", "adding_text_before_match$1");
How to match words not starting with another word:
(?<!wont_start_with_that)word_to_match
Which leads us to:
private string ParseUrlInText(string text)
{
return Regex.Replace(text, @"(?<!HTTP://)(WWW\.[A-Za-z0-9_\.]+)",
@"HTTP://$1");
}
Upvotes: 2
Reputation: 881
I'd go for the following:
1) You don't handle same elements twice,
2) You replace all instances once
private string ParseUrlInText(string text)
{
string currentText = text;
var workingText = currentText.Split(new[] { "\r\n", "\n", " ", "</br>" },
StringSplitOptions.RemoveEmptyEntries).Distinct() // .Distinct() gives us just unique entries!
foreach (string word in workingText)
{
string thing;
if (word.ToLower().StartsWith("www."))
{
if (IsAllUpper(word))
{
thing = "HTTP://" + word;
currentText = currentText.Replace("\r\n" + word, "\r\n" + thing)
.Replace("\n" + word, "\n" + thing)
.Replace(" " + word, " " + thing)
.Replace("</br>" + word, "</br>" + thing)
}
}
}
return currentText;
}
Upvotes: 0